I should have written this earlier if not for non-FFmpeg work I have to do here. BTW, are some linguists around there that can explain a relation between bureaucratic and textile (“bureaucracy” comes from a sort of cloth used to cover tables, “red tape” is rather obvious, Russian “????????”, “????????” and “??????????” are also related to a process of obtaining thin threads). Ahem.
AAC coding has two computationally costly operations — MDCT and coefficient quantisation. While the former takes more cycles per one call, the latter is usually called several times for each frame, so those times tend to sum up and outweigh MDCT in bad encoders (like mine). From rate distortion theory we know how to determine proper quantizers for AAC – distortion caused by that quantisation multipled by lambda plus number of bits needed to code that band with this quantiser should be minimal for given value of lambda.
How could we achieve this? Well, use one of three approaches:
- Assign some fixed quantizers
- Use some ad hoc rule to determine quantiser and then refine its value a bit (aka heuristic, since it gives good speed, it is widely used)
- Try all possible quantisers by brute force or Viterbi method (optimal but very slow)
With heuristic you have one catch: if your primary guess on quantiser is not good then refining either takes a lot of time or gives you far from optimal result. Trellis-based search is implemented in my decoder and results in around 20x slower than realtime encoding speed (i.e. encoding one second of audio takes 20 seconds of CPU time) on modern CPUs. I’m playing with something heuristical and fast.
Now to quantising itself.
Each coefficient is quantised as out = (int)pow(in / quantiser, 0.75);
. Division of floating-point numbers is slow, taking power of a number is even slower. You can convert MDCT coefficients to the power of three fourths (and quantisers are also converted in precomputed table), thus getting rid of power. FAAC also multiplies coefficients so they are always quantised except for taking integer part. My decoder just multiplies possible codebook vectors by that quantiser and compares it with input coefficients leaving them intact. I also had an idea to present MDCT coefficients in base pow(2, 0.25)
making it easy to manipulate but someone still has to test it where base conversions won’t eat all of the gain. I have also tried several optimisations like not trying to match coefficients against all codebook vectors using only close enough vectors. More approaches to try.
(I hope these notes will form “How I Wrote the Best Opensource AAC Encoder Around (to Accompany x264)” memoirs :-S )
If you speak to Loren, Jason and Michael, I’m sure they’ll have some good suggestions for how to approach this. Having the best AAC encoder as well as the best H.264 encoder would certainly be welcome.
And Gabriel Bouvigne.
What are you focused on at the moment? The point of having the RD optimal approach was to have something to target, though it shouldn’t be hideously slow, still usable, just slow.
The vast majority of the time I expect you would want to use a variety of heuristics that are switchable to give a range of speed/quality trade offs. Analogous to ME search patterns.
I think the _most_ important thing is to work towards beating faac in some way – either being close in speed and better in quality or close in quality and better in speed. Or better, both.
Once you get to that stage I see no reason why it should not be merged, assuming the code is good which IIRC it is but no doubt Michael will have comments. I think this would be a good base point from which we can work to achieve world domination for AAC as well.