Archive for June, 2008

AAC: going to psychoacoustics

Tuesday, June 24th, 2008

Looks like Gabriel Bouvigne of mp3-tech.org (how many information I got from there!) and Lame fame took interest in AAC encoder. For now I’m following his advise and trying to implement psychoacoustic model after 3GPP TS 26.403 document. It should be simple yet effective enough.

In the other news: AAC decoder mutates to become fit for FFmpeg SVN inclusion. I hope that will happen soon. Keep going, Robert, and keep reviewing, Michael!

Update on AAC progress

Monday, June 16th, 2008

If you are interested in what happens with my encoder, here’s a piece of report.

Simple encoding works. That means you can encode files with it now and they can be played back and you’ll be able to recognize the sound. Also I’ve separated psychoacoustic model and encoder itself, so it calls model to ask what windowing to use and what scaling/coefficients to encode.
Can I say this concludes the task for this summer of code? Technically yes but there are few points I ought to finish.

Encoder side:

  • MDCT for the cases different from simple 1024-point window (8 short windows sequence and two transition windows)
  • correct bitstream writing for 8SS case
  • probably multichannel encoding (it’s useless until we have defined multichannel audio API though)

Psychoacoustic model(s) side:

  • good psychoacoustic model 🙂
  • quantizer which allows rate control
  • something else?

I can add some models after the work is complete too and probably tune it for my ears and music I like to listen to. Reading papers I got on psychoacoustic models should help.

Back to work then.

Some progress in AAC encoder

Monday, June 9th, 2008

OK, now I have simple and not very correct AAC encoder. Because of quantization step missing (spectral coefficients should be downscaled by cube root from them) resulting AAC becomes louder and FAAD complain on quantisation value being too large. FFmpeg future AAC decoder just silently clips it. In any case, it produces sound close to original.

Since no psychoacoustics is employed for now, bitrate is too high (~400kbps per channel, no joint stereo savings too).

So, the plan is to:

  • Fix and optimize bitstream writing (yes, bitstream packing is far from optimal too)
  • Psychoacoustic model (I hope it will be easier than multichannel audio API in FFmpeg)
  • Bitrate control

Back to work…

Still in Memphis…

Wednesday, June 4th, 2008

Although it’s no good to kick a dead horse discuss the work of governmental establishments, I still want to cry about total ineffectiveness of our customs.

Today I’ve received stored value card from Google sent on May 27. They have also sent a book. On May 15th. And it’s still not here.
Here is the log – see for yourselves: 1.png.
And something stops me from believing custom officers take their time reading that book.

Last year package sent by Mike took a whole month rest. Not mentioning $10 fee for those 12 recorded DVD+Rs.

Surprisingly, x86 box I’ve recently got was delivered in a week by Express Mail or something similar.

Good system architects optimize system performance by removing the biggest and most probable delay. If delivering a package takes a week and clearing at custom takes at least two then what makes the weakest, slowest and ugliest link? Where can I submit a patch to this process?

P.S. For those who don’t know the title origin – look here.