General psychoacoustic <-> coding interaction principles

March 5th, 2009

OK, let’s suppose we have some abstract subband coder. What it does? It performs some transform on input block of data (like MDCT or QMF filterbank) then obtained frequencies are grouped, quantized and coded.

There could be many approaches but usually there are two general principles employed:

  • Some frequencies matter more than another.
  • Energy carried by subbands matters too.

Psychoacoustic model gives us a list of subband weights meaning their importance. Now what encoder could do with them? Quantize input data and code it. There are three approaches:

  1. Perform optimal coding using psychoacoustic data (good but slow)
  2. Do some heuristics to get some quick and dirty approximation (most popular approach)
  3. Ignore psychoacoustics completely (seems to be popular too)

Optimal coding may be done by employing Vitterbi method in one form or another. Heuristics are usually done in that way: give some initial prediction value for quantizer then refine it a bit until result is close enough to desired one.

More on AAC-specific coding later.

AAC encoder and psy model

March 5th, 2009

As you may know, I am working (mostly NOT working though :(, but still remember about it) on AAC encoder. This morning I’ve made simpler psychoacoustic model inspired by FAAC (yes, Dark Shikari, FAAC has some sort of hardcoded psy model) work with my encoder.

I’ll try to use this blog with its original purpose — to formalize my thoughts on subject at hand. I thinks many posts on different aspects of psychoacoustics will follow before more or less suitable encoder will appear. “More or less suitable” means it should be at least a good audio encoding counterpart for x264 (while “fully suitable” means total world domination).

Too bad there’s not enough time (always).

A bit of news

March 3rd, 2009

Looks like I’ve neglected my blog for some time. In order to improve situation a bit here are some assorted pieces of news:

  • FFmpeg release — probably we will have one Really Soon Now. Previous release was before I started developing for FFmpeg.
  • RV3/4 is improving bit by bit. For now most troubles lie in incorrect motion vector predictions for B-frames. I hope to fix it one day (or preferably that someone else will fix it but that’s even more unlikely).
  • SwScaler is slowly moving to be usable under LGPL. Probably it will be only x86 SIMD code that will be left under GPL.
  • PB-frames support was added. So the only one who cares about Intel codecs (Benjamin, son of Lars) can watch i263 with lavc decoder now.
  • I took some time to understand ELBG code in lavc and wrote simple 15bit MS Video1 encoder. Patch pending.
  • I’ve tried to RE BMV (video format employed in Discworld II and Discworld Noir). Discworld II decoder is in ScummVM sources, so I give a shot on DW Noir which is unlikely to be supported by any opensource engine. While figuring out header and container format was piece of cake, finding out audio compression scheme was easy (boy, they do like SWAR!), I have troubles determining which function is used for video decoding.

More news will follow eventually.

The end-of-year summary

December 29th, 2008

Ok, let’s see what I’ve done this year:

  • simple IMA ADPCM encoder – Apple variant
  • worked a bit on different codecs – BMP, Fraps, Monkey’s Audio, TIFF, VC-1
  • got RV40 and RV30 finally working more or less as supposed (some garbage still occurs on some B-frames, but mostly both decoders produce watchable video)

So, what are the plans for the next year:

  • Find more time for FFmpeg development
  • Take part in GSoC (it gives a good reason to work on FFmpeg and also makes a good source of T-shirts)
  • Go abroad
  • And, of course, make FFmpeg closer to the world domination

For the last step I need:

  • Add more formats support to FFmpeg (WavPack lossy, Lucas Arts games formats, Bink, etc.)
  • Convince Mike finish his Xan4 decoder
  • Convince Robert finish his AMR-NB decoder (unless somebody beats him to it) and AAC-HE decoding support (those messages about SBR not supported are really nagging me)
  • Convince Kostya finish AAC encoder (hey, that’s me!)

So, let’s see what we get in the upcoming year.

The list of game codecs I want to have in FFmpeg

December 2nd, 2008

One of FFmpeg advantages is that it supports decoding of many fringe formats, especially game formats. My favourite is Sierra VMD but there are several other formats I’d like to be able to play:

  • LucasArts SMUSH (there was a patch for playing some variants, the rest could be made from ScummVM code)
  • Discworld II and III video format (if only ScummVM developers got the code for DW2 at least)
  • Indeo 4 and 5
  • Bink (if only certain person worked on REing it instead of formats used in EA games)

Code donations are welcomed 😉

Update: ScummVM has DW2 BMV decoder now.

RV40 is in

December 2nd, 2008

As you may know from other place, FFmpeg got RV40 decoder. There’s still some hope that FFmpeg will get RV30 decoder as well (it needs loop filter and squishing some bugs).

Some notes about performance:

  • PPC G4 1.42GHz — on par
  • Celeron 600MHz (inside ASUS Eee) — significantly slower (2 minutes of the same source decoded in 64 and 82 seconds by binary and native decoders respectively)

When I switch motion compensation functions from C implementations to optimised H.264 counterparts (they are slightly different so the picture quality gets worse) native decoder becomes faster than binary one by several percents on x86 and even faster on PPC. Conclusion: if you want fast decoding then submit SIMD versions of motion compensation functions.

RV: a small update

November 23rd, 2008

Hereby I declare that my RV40 decoder changed its status from “Well, it’s better than nothing” to “Good enough”. While there are still problems with chroma and jitter in B-frames due to wrong motion vectors prediction, luma decoding is bitexact on I- and P-frames.
I hope to weed them out and have decoding enabled in FFmpeg before next year. Maybe RV30 too.

For those who ask specs on RealVideo:

???????RealVideo

I hope the message is clear enough.

RV3/4 decoders present state: stalled again

November 4th, 2008

I’ve been very busy with the things outside FFmpeg yet I’ve managed to do something on RV3/4 decoders too:

  • Found and fixed an old bug with quantisation for DC coefficients.
  • Cleaned a bit RV4 loop filter.
  • Fixed chroma MC bug in RV3 decoder.
  • B-frames motion vectors are now closer to the reality in RV3.

What is missing:

  • RV3 loop filter
  • correct RV3 motion vectors calculation
  • RV4 motion compensation incompatibilities

The main problem is that I don’t quite understand why it’s working in the way it works and (in some parts) how it works. Hopefully it will be clearer next time I’ll look at it.

A RE Puzzle

September 21st, 2008

There is a codec, little is know about it.

Here are some of its features:

  • it codes frame as a three planes
  • it employs motion compensation
  • it employs old-school vector quantization — fill block, fill block with mask, …
  • each value in the stream is coded as the run of values plus actual value coded as several Huffman-coded nibbles (yes, each 4 bits are coded with own Huffman tree) plus sign bit if applicable
  • sometimes it performs DCT to restore block content

Try to guess what’s its name.

Hint: MultimediaWiki contains a description of the codec with the similar bitstream reading techniques, which is a relative of this codec.

And, no, since I’m engaged in AAC encoder, I won’t RE nor write decoder for it (at least in the near future).

A bit of audio news

August 31st, 2008

Looks like I soon will run out of titles, have to switch to something else then.

I’m still working on improving AAC encoder, for now I’m trying to fit trellis-based quantizer selection, then optimal quantizing will follow. In a process FFmpeg may get generic psychoacoustic model interface (there’s new IIR filters interface in SVN, with implementation for lowpass Butterworth filter only though).

In other news – there was a raise of interest to DCA decoder. Of course FFmpeg does not have one (wink wink), but there were some patches to correct tables used by it and improve speed (to nonexistent decoder, that is). Also there’s a person who hasn’t written a DCA encoder, he now reads development mailinglist and submits patches to out nonexistent decoder. Welcome!

Disclaimer. Sorry for political language, but I remember the troubles caused to Videolan because they had developed and hosted DTS decoder.

Also there’s a rise of interest to TrueSpeech, since it is used in some messenger (I’ve never thought it would be popular). I have to update information about it in MultimediaWiki a bit.