Archive for March, 2009

A bit of new hardware

Saturday, March 21st, 2009

I’ve wanted to write another useless rant about idiocy in our lives as a governing policy (for example, 1st class railroad cars being worse than 2nd class but more expensive or how “express” is translated into Ukrainian as “????????????” or “???????????”, both meaning “accelerated” or “sped-up”) but I have a bit of more pleasant news.

I’ve spent the rest of GSoC money on BeagleBoard and it took about 15 days to deliver it (which is rather impressive by local standards). So I hope to start hacking on it too (I’m pretty sure it would be good for both FFmpeg and me if I learn ARM assembly and about NEON unit). In my opinion they would really benefit from having built-in network adapter (there’s a place for it on PCB too) though; since this is not Mac, saying that USB should be enough for everything is rather lame.

Notes on AAC quantisation

Thursday, March 19th, 2009

I should have written this earlier if not for non-FFmpeg work I have to do here. BTW, are some linguists around there that can explain a relation between bureaucratic and textile (“bureaucracy” comes from a sort of cloth used to cover tables, “red tape” is rather obvious, Russian “????????”, “????????” and “??????????” are also related to a process of obtaining thin threads). Ahem.

AAC coding has two computationally costly operations — MDCT and coefficient quantisation. While the former takes more cycles per one call, the latter is usually called several times for each frame, so those times tend to sum up and outweigh MDCT in bad encoders (like mine). From rate distortion theory we know how to determine proper quantizers for AAC – distortion caused by that quantisation multipled by lambda plus number of bits needed to code that band with this quantiser should be minimal for given value of lambda.

How could we achieve this? Well, use one of three approaches:

  1. Assign some fixed quantizers
  2. Use some ad hoc rule to determine quantiser and then refine its value a bit (aka heuristic, since it gives good speed, it is widely used)
  3. Try all possible quantisers by brute force or Viterbi method (optimal but very slow)

With heuristic you have one catch: if your primary guess on quantiser is not good then refining either takes a lot of time or gives you far from optimal result. Trellis-based search is implemented in my decoder and results in around 20x slower than realtime encoding speed (i.e. encoding one second of audio takes 20 seconds of CPU time) on modern CPUs. I’m playing with something heuristical and fast.

Now to quantising itself.

Each coefficient is quantised as out = (int)pow(in / quantiser, 0.75);. Division of floating-point numbers is slow, taking power of a number is even slower. You can convert MDCT coefficients to the power of three fourths (and quantisers are also converted in precomputed table), thus getting rid of power. FAAC also multiplies coefficients so they are always quantised except for taking integer part. My decoder just multiplies possible codebook vectors by that quantiser and compares it with input coefficients leaving them intact. I also had an idea to present MDCT coefficients in base pow(2, 0.25) making it easy to manipulate but someone still has to test it where base conversions won’t eat all of the gain. I have also tried several optimisations like not trying to match coefficients against all codebook vectors using only close enough vectors. More approaches to try.

(I hope these notes will form “How I Wrote the Best Opensource AAC Encoder Around (to Accompany x264)” memoirs :-S )

My proposal on roadmap for FFmpeg

Tuesday, March 17th, 2009

Here’s the thing either Compn, known for his passion to document codecs, or Mike, known for his passion to diagrams, charts and codecs, should have done loooooong time ago.

While the same information may be obtained from Multimedia Wiki, a graphical layout should be more handy for claims like “… include reverse-engineering of all Real video formats” here. I am also aware of list of supported codecs in MPlayer documentation but it’s also boring and not very useful as a reference.

Here’s how I like it — green status for supported codecs, red for unsupported. But from a glance on it you can see what’s missing and what should be added to my beloved video conversion tool.

scheme

Note: I know that we have to enhance FFmpeg in other areas than different formats support (filter system, for example). Patches welcome.

General psychoacoustic <-> coding interaction principles

Thursday, March 5th, 2009

OK, let’s suppose we have some abstract subband coder. What it does? It performs some transform on input block of data (like MDCT or QMF filterbank) then obtained frequencies are grouped, quantized and coded.

There could be many approaches but usually there are two general principles employed:

  • Some frequencies matter more than another.
  • Energy carried by subbands matters too.

Psychoacoustic model gives us a list of subband weights meaning their importance. Now what encoder could do with them? Quantize input data and code it. There are three approaches:

  1. Perform optimal coding using psychoacoustic data (good but slow)
  2. Do some heuristics to get some quick and dirty approximation (most popular approach)
  3. Ignore psychoacoustics completely (seems to be popular too)

Optimal coding may be done by employing Vitterbi method in one form or another. Heuristics are usually done in that way: give some initial prediction value for quantizer then refine it a bit until result is close enough to desired one.

More on AAC-specific coding later.

AAC encoder and psy model

Thursday, March 5th, 2009

As you may know, I am working (mostly NOT working though :(, but still remember about it) on AAC encoder. This morning I’ve made simpler psychoacoustic model inspired by FAAC (yes, Dark Shikari, FAAC has some sort of hardcoded psy model) work with my encoder.

I’ll try to use this blog with its original purpose — to formalize my thoughts on subject at hand. I thinks many posts on different aspects of psychoacoustics will follow before more or less suitable encoder will appear. “More or less suitable” means it should be at least a good audio encoding counterpart for x264 (while “fully suitable” means total world domination).

Too bad there’s not enough time (always).

A bit of news

Tuesday, March 3rd, 2009

Looks like I’ve neglected my blog for some time. In order to improve situation a bit here are some assorted pieces of news:

  • FFmpeg release — probably we will have one Really Soon Now. Previous release was before I started developing for FFmpeg.
  • RV3/4 is improving bit by bit. For now most troubles lie in incorrect motion vector predictions for B-frames. I hope to fix it one day (or preferably that someone else will fix it but that’s even more unlikely).
  • SwScaler is slowly moving to be usable under LGPL. Probably it will be only x86 SIMD code that will be left under GPL.
  • PB-frames support was added. So the only one who cares about Intel codecs (Benjamin, son of Lars) can watch i263 with lavc decoder now.
  • I took some time to understand ELBG code in lavc and wrote simple 15bit MS Video1 encoder. Patch pending.
  • I’ve tried to RE BMV (video format employed in Discworld II and Discworld Noir). Discworld II decoder is in ScummVM sources, so I give a shot on DW Noir which is unlikely to be supported by any opensource engine. While figuring out header and container format was piece of cake, finding out audio compression scheme was easy (boy, they do like SWAR!), I have troubles determining which function is used for video decoding.

More news will follow eventually.