I should have written this earlier if not for non-FFmpeg work I have to do here. BTW, are some linguists around there that can explain a relation between bureaucratic and textile (“bureaucracy” comes from a sort of cloth used to cover tables, “red tape” is rather obvious, Russian “????????”, “????????” and “??????????” are also related to a process of obtaining thin threads). Ahem.
AAC coding has two computationally costly operations — MDCT and coefficient quantisation. While the former takes more cycles per one call, the latter is usually called several times for each frame, so those times tend to sum up and outweigh MDCT in bad encoders (like mine). From rate distortion theory we know how to determine proper quantizers for AAC – distortion caused by that quantisation multipled by lambda plus number of bits needed to code that band with this quantiser should be minimal for given value of lambda.
How could we achieve this? Well, use one of three approaches:
- Assign some fixed quantizers
- Use some ad hoc rule to determine quantiser and then refine its value a bit (aka heuristic, since it gives good speed, it is widely used)
- Try all possible quantisers by brute force or Viterbi method (optimal but very slow)
With heuristic you have one catch: if your primary guess on quantiser is not good then refining either takes a lot of time or gives you far from optimal result. Trellis-based search is implemented in my decoder and results in around 20x slower than realtime encoding speed (i.e. encoding one second of audio takes 20 seconds of CPU time) on modern CPUs. I’m playing with something heuristical and fast.
Now to quantising itself.
Each coefficient is quantised as
out = (int)pow(in / quantiser, 0.75);. Division of floating-point numbers is slow, taking power of a number is even slower. You can convert MDCT coefficients to the power of three fourths (and quantisers are also converted in precomputed table), thus getting rid of power. FAAC also multiplies coefficients so they are always quantised except for taking integer part. My decoder just multiplies possible codebook vectors by that quantiser and compares it with input coefficients leaving them intact. I also had an idea to present MDCT coefficients in base
pow(2, 0.25) making it easy to manipulate but someone still has to test it where base conversions won’t eat all of the gain. I have also tried several optimisations like not trying to match coefficients against all codebook vectors using only close enough vectors. More approaches to try.
(I hope these notes will form “How I Wrote the Best Opensource AAC Encoder Around (to Accompany x264)” memoirs :-S )