Archive for the ‘Audio’ Category

Why Lossless Audio Codecs generally suck

Saturday, November 27th, 2010

Why there are so many lossless audio codecs? Mike, obviously, had his thoughts on that subject and I agree with my another friend who said: “it’s just too easy to create lossless audio codec, that’s why everybody creates his own”.

Well, theory is simple: you remove redundancy from samples by predicting their values and code the residue. Coding is usually done with Rice codes or some combination of Rice codes and an additional coder — for zero runs or for finer coding of Rice codes. Prediction may be done in two major ways: FIR filters (some fixed prediction filters or LPC) or IIR filters (personally I call those “CPU eaters” for certain property of codecs using it). And of course they always invent their own container (I think in most cases that’s because they are too stupid to implement even minimal support for some existing container or even to think how to fit it into one).

Let’s iterate through the list of better-known lossless audio codecs.

  1. ALAC (by Apple) — nothing remarkable, they just needed to fit something like FLAC into MOV so their players can handle it
  2. Bonk— one of the first lossless/lossy codecs, nobody cares about it anymore. Some FFmpeg developers had intent to enhance it but nothing substantial has been done. You can still find that “effort” as Sonic codec in libavcodec.
  3. DTS-HD MA — it may employ both FIR and IIR prediction and uses Rice codes but they totally screwed bitstream format. Not to mention there’s no openly available documentation for it.
  4. FLAC — the codec itself is good: it’s extremely fast and features good compression ratios. The only bad thing about it is that it’s too hard to seek properly in it since there’s no proper frame header and you can just hope that that combination of bits and CRC are not false positive.
  5. G.711.0 — have you ever heard about it? That’s its problem: nobody cares and nobody even tries to use it.
  6. MLP/Dolby True-HD — it seems to be rather simple and it exists solely because there was no standardised lossless audio codec for DVD.
  7. Monkey’s Audio — well, the only good thing about is that it does not seem to be actively developed anymore.
  8. MPEG-4 ALS — the same problem: it may be standardised but nobody cares about it.
  9. MPEG-4 SLS — even worse since you need bitexact AAC decoder to make it work.
  10. OggSquish — luckily, it’s buried for good but it also spawned one of the worst container formats possible which still lives. And looking at original source of it one should not wonder why.
  11. RealAudio Lossless Format — I always say it was named after its main developer Ralph Wiggum. This codec is very special — they had to modify RM container format specially for it. A quick look inside showed that they use more than 800 (yes, more than eighty hundred) Huffman tables, most of them with several hundreds of codes (about 400 in average). That reminds me of RealVideo 4 with its above-the-average number of tables for context-dependant coding.
  12. Shorten — one of the first lossless audio codecs. Hardly anyone remembers it nowadays.
  13. TAK — it was originally called YALAC (yet another lossless audio codec) for a reason. Since it’s closed-source and fortunately not widespread (though some idiots use it for CD rip releases), it just annoys me time from time but I don’t think someone will work on adding support for it in FFmpeg.
  14. TrueAudio (TTA) — I can say anything about it except it seems to be quite widespread and it works. Looks like they’re still alive and work on TTA2 but who cares?
  15. WavPack — that’s rather good codec with sane bitstream format too. Looks like its author invested some time in its design. Also he sent patches to implement some missing features in our decoder (thank you for that!).
  16. WMA Lossless — from what I know, it uses IIR filter based on least minimum squares method for finding its coefficients. It has two peculiarities: that filter is also used for inter-channel decorrelation and bitstream format follows WMA9 format, i.e. it has something like interframes and frame data starting at arbitrary point (hello, MP3!).

P.S. I still hope this post won’t encourage anybody to write yet another useless lossless audio decoder.

Notes on AAC quantisation

Thursday, March 19th, 2009

I should have written this earlier if not for non-FFmpeg work I have to do here. BTW, are some linguists around there that can explain a relation between bureaucratic and textile (“bureaucracy” comes from a sort of cloth used to cover tables, “red tape” is rather obvious, Russian “????????”, “????????” and “??????????” are also related to a process of obtaining thin threads). Ahem.

AAC coding has two computationally costly operations — MDCT and coefficient quantisation. While the former takes more cycles per one call, the latter is usually called several times for each frame, so those times tend to sum up and outweigh MDCT in bad encoders (like mine). From rate distortion theory we know how to determine proper quantizers for AAC – distortion caused by that quantisation multipled by lambda plus number of bits needed to code that band with this quantiser should be minimal for given value of lambda.

How could we achieve this? Well, use one of three approaches:

  1. Assign some fixed quantizers
  2. Use some ad hoc rule to determine quantiser and then refine its value a bit (aka heuristic, since it gives good speed, it is widely used)
  3. Try all possible quantisers by brute force or Viterbi method (optimal but very slow)

With heuristic you have one catch: if your primary guess on quantiser is not good then refining either takes a lot of time or gives you far from optimal result. Trellis-based search is implemented in my decoder and results in around 20x slower than realtime encoding speed (i.e. encoding one second of audio takes 20 seconds of CPU time) on modern CPUs. I’m playing with something heuristical and fast.

Now to quantising itself.

Each coefficient is quantised as out = (int)pow(in / quantiser, 0.75);. Division of floating-point numbers is slow, taking power of a number is even slower. You can convert MDCT coefficients to the power of three fourths (and quantisers are also converted in precomputed table), thus getting rid of power. FAAC also multiplies coefficients so they are always quantised except for taking integer part. My decoder just multiplies possible codebook vectors by that quantiser and compares it with input coefficients leaving them intact. I also had an idea to present MDCT coefficients in base pow(2, 0.25) making it easy to manipulate but someone still has to test it where base conversions won’t eat all of the gain. I have also tried several optimisations like not trying to match coefficients against all codebook vectors using only close enough vectors. More approaches to try.

(I hope these notes will form “How I Wrote the Best Opensource AAC Encoder Around (to Accompany x264)” memoirs :-S )

General psychoacoustic <-> coding interaction principles

Thursday, March 5th, 2009

OK, let’s suppose we have some abstract subband coder. What it does? It performs some transform on input block of data (like MDCT or QMF filterbank) then obtained frequencies are grouped, quantized and coded.

There could be many approaches but usually there are two general principles employed:

  • Some frequencies matter more than another.
  • Energy carried by subbands matters too.

Psychoacoustic model gives us a list of subband weights meaning their importance. Now what encoder could do with them? Quantize input data and code it. There are three approaches:

  1. Perform optimal coding using psychoacoustic data (good but slow)
  2. Do some heuristics to get some quick and dirty approximation (most popular approach)
  3. Ignore psychoacoustics completely (seems to be popular too)

Optimal coding may be done by employing Vitterbi method in one form or another. Heuristics are usually done in that way: give some initial prediction value for quantizer then refine it a bit until result is close enough to desired one.

More on AAC-specific coding later.

AAC encoder and psy model

Thursday, March 5th, 2009

As you may know, I am working (mostly NOT working though :(, but still remember about it) on AAC encoder. This morning I’ve made simpler psychoacoustic model inspired by FAAC (yes, Dark Shikari, FAAC has some sort of hardcoded psy model) work with my encoder.

I’ll try to use this blog with its original purpose — to formalize my thoughts on subject at hand. I thinks many posts on different aspects of psychoacoustics will follow before more or less suitable encoder will appear. “More or less suitable” means it should be at least a good audio encoding counterpart for x264 (while “fully suitable” means total world domination).

Too bad there’s not enough time (always).

A bit of audio news

Sunday, August 31st, 2008

Looks like I soon will run out of titles, have to switch to something else then.

I’m still working on improving AAC encoder, for now I’m trying to fit trellis-based quantizer selection, then optimal quantizing will follow. In a process FFmpeg may get generic psychoacoustic model interface (there’s new IIR filters interface in SVN, with implementation for lowpass Butterworth filter only though).

In other news – there was a raise of interest to DCA decoder. Of course FFmpeg does not have one (wink wink), but there were some patches to correct tables used by it and improve speed (to nonexistent decoder, that is). Also there’s a person who hasn’t written a DCA encoder, he now reads development mailinglist and submits patches to out nonexistent decoder. Welcome!

Disclaimer. Sorry for political language, but I remember the troubles caused to Videolan because they had developed and hosted DTS decoder.

Also there’s a rise of interest to TrueSpeech, since it is used in some messenger (I’ve never thought it would be popular). I have to update information about it in MultimediaWiki a bit.

AAC: Life goes on

Monday, August 25th, 2008

Well, Summer of Code 2008 goes to end. I think I’ve completed my task – creating simple AAC encoder with psychoacoustics
(here are my application link and a rough task description). I’ve written AAC encoder with psychoacoustic model made after 3GPP TS26.403 and it seems to work fine (on multichannel sources too).

Now there’s a small catch – to be included into FFmpeg it should be optimal in the terms of coding. That means minding rate distortion, choosing optimal quantizers and codebooks, etc. Now it’s a bit harder task, so I think final encoder will be in SVN this autumn or later. Any pointers on information how to optimally encode some part of stream are welcome (for miscellaneous stuff like pulses, temporary noise shaping, etc.).

Stay tuned (and use Flac instead 😉 ).

A bit more

Friday, August 8th, 2008

With low-pass filter my AAC encoder is more or less feature-complete. Of course there’s still more room for improvements but it’s pretty fine now. I’d like to submit it for review but it depends on some parts of AAC decoder and it’s still under review :-(. So I don’t have much to do until then.

So I switched to last GSoC task and hacked again at RV40 loop filter. Well, filter invoking pattern is almost there and I’ve fixed several bugs in actual filtering code. Bit it’s not there yet. Maybe in a month it will be so if AAC encoder won’t take all my time again.

News + Extra

Sunday, August 3rd, 2008

AAC front: to compete with other encoders I have to implement low-pass filter. Benjamin suggested Butterworth filter, so I will try it next week. Hopefully that will be the last big feature to do.

RV front: looks like deblocking pattern is generated from comparing motion vectors, if the difference for subblocks is greater than 3, then edge between them is scheduled for loop filtering. Don’t expect working loop filter implementation too soon though, I still have to deal with AAC encoder and it’s more important.

Extra: I’ve finally decided to buy ASUS Eee, it was easy thing to do – there’s only one model (Eee 701 4G with Win XP installed) for about the same price of four hundred bucks (maybe $450 in greedy shops). So the first thing I did with it was installing Linux and tearing down that stupid “Designed for Windows” label (which was surprisingly easy thing to do and left no marks on laptop surface).

Now here are complains about Ubuntu Eee (I don’t have USB DVD drive and Xandros hasn’t worked from USB flash drive for me): it requires some hacking of system configuration to make it work (like shutdown properly) but that I can live with, but the braindead thing is that gcc is installed (why?) without any development header or library, so you can’t compile even “Hello, world!” program. Both of those issues are resolved, so I just need to make this toy more useful to me 🙂

AAC: encoder progress

Friday, August 1st, 2008

If there are still people interested in my work on AAC encoder (I doubt though), here is a status report for you.

Tasks completed:

  • Make encoder provide lookahead samples
  • Make psy model use these samples to produce block switching decision
  • M/S case handling (at least I hope so)
  • Multichannel coding (from mono to 5.1, seems to work fine)

So there are only two bits left: tune model to produce better sound and commit that all to FFmpeg SVN.

The problem with current implementation is that it follows given bitrate too closely, so some frames may be coded too badly resulting in audible artifacts. I will add ABR mode and quality-based mode to make it better. Another issue is how to use codec options to turn them on, so Robert Swain can add profiles for them ;).

In related news: AAC decoder is slowly migrating to FFmpeg SVN, so I should be able to submit my encoder for review and inclusion as well.

A bit of AAC news

Sunday, July 20th, 2008

I’ve not abandoned work on AAC encoder completely.
For example, I think now my psy model calculates and handles bitrate better. Now the only goals I set to myself are:

  • Make encoder provide psy model lookahead samples for block switching
  • Finetune psy model, or more specifically:
    • block switching decision
    • finish M/S case handling (for now it does not update all psy model information and result in artifacts on several test samples)
    • adjust scalefactors to reduce quantisation distortion
    • some other tricks from 3GPP TS26.403?

After that there are several ways to go: improve quality until it beats other encoders (at least libfaac), implement SBR/PS encoding (the latter is easier to do), introduce multichannel coding.

I suspect that at least one person will suggest to do it all.