The biggest curse in codec design « Kostya's Boring Codec World

The biggest curse in codec design

This post is an answer to the comment by Alex Converse on my previous post:

It’s interesting how quickly you dismiss SLS for being a hybrid coder with AAC. From a pure lossless standpoint that is a weakness but from a broader perspective it allows for a lossy layer that is widely compatible with existing hardware.

Let’s see why scalable coding is a weakness from lossless coding standpoint.

There are few hybrid lossy+lossless codecs out there which use lossy part in lossless reconstruction, e.g. MPEG-4 SLS, DTS-HD MA and WavPack. First two use two different coding techniques – MDCT or QMF for core coding and usual lossless coding for difference. In WavPack both parts are coded in the same way and correction data is stored in different block. For DCT-based codecs there are many ways of performing DCT (from trivial matrix multiplication to FFT-based implementation to decomposing DCT into smaller size DCTs) which may lead to slightly different output depending on method chosen. Thus, you should have a reference way (i.e. not the fastest one) of doing lossy stuff or you can’t guarantee truly lossless reconstruction. Also residue (the difference between original and lossy coded signal) tends to be more chaotic in this case and thus less compressible.

Another issue is what to do with correction data. If you put it into a separate file, you will have more troubles since you have to manage two files; if you put it all into single file, its size will be bigger than pure lossless coded file (unless you have very stupid method of lossless coding).

And now comes an argument that I really hate: “but it allows legacy players handle those files”. That, in my opinion, is this post title. Making it backward compatible just cripples it. In that case you need to implement new (and sometimes completely different features) in old limits and relying to new technology. So in some case it just degrades quality and/or forces you to encode something twice — for old feature set and its replacement. Another reason is that it just delays that codec adoption: old player can play it so why should I bother about this new codec support? I suspect this was a reason why we have MLP but no DTS-HD support.

The worst offender here is MP3. This codec sucks by design. It uses 36-point (or three 12-point) MDCTs which are not trivial to speed-up unlike power-of-two transforms and the output of MDCTs is used as input to QMF used in MPEG Audio layers I&II, as it claimed “to be compatible with them”. As claimed here, MP3 would perform better and since it comes from one of the leading LAME developers, I believe it. And of course MP3Pro. Most players in existence just ignore extension part and play crippled version of sound. Someone may argue that’s because it’s proprietary. Okay, look at HE-AAC where SBR is documented at least, it may still cause some confusion since it may be detected only when decoding audio frame.

In my opinion both implementing new codec support and special codec extension in general case is just single-time action with comparable effort (hacking existing code for new extension support and detection may be not that easy). And thus, adding a new codec should be preferred. MPEG-2 introduced both AAC (please look how it was called back then) and multichannel extensions to layers I-III. Guess which one works better?

This entry was posted on Sunday, November 28th, 2010 at 11:22 am and is filed under Useless Rants. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

7 Responses to “The biggest curse in codec design”

Steve Lhomme says:

November 28, 2010 at 3:05 pm

The 2 parts in WavPack can be stored in the same file in Matroska. That’s why BlockAddition was created. It’s the complement part of the lossy part stored in regular Blocks.
Robert Swain says:

November 28, 2010 at 3:21 pm

I don’t think there’s anything necessarily wrong with layered codecs and in many cases the layers are coded in independent ways such that the layers themselves make sense.

The mess in AAC with the HE extension ‘layers’ is in the signalling. Plus, yes, the complexity of the extensions, particularly SBR. There is a specification for a lower-complexity SBR implementation approach. However, this is just one specification and one could certainly specify a layered codec that doesn’t have the signalling mess that AAC does.

About the DCT lossy + difference = lossless type codecs – I haven’t checked any of them but I would assume that their MDCTs/IMDCTs are bit exact otherwise the losslessness isn’t really valid, as you point out. I somehow doubt the authors of the specifications and implementations would be so neglectful as to overlook this point.

I’m not sure of the usefulness of lossy + diff = lossless formats like the way wavpack does it. Maybe someone wants to download a torrent and wants good quality lossy rather than complete lossless. But why wouldn’t they just get a different torrent that used some lossy codec instead? Perhaps they really liked what they got and wanted it in lossless so now only have to download the diff files instead of a complete lossless set. Maybe it’s nice to have full lossless locally and lossy on some smaller capacity portable device but I do wonder how many portable devices will decode wavpack at all, let alone lossy wavpack (outside of Rockbox as perhaps that does).

In any case, in the current real consumer world of deployed, constrained hardware that will only handle certain media, backwards compatibility is interesting. Better to have some audio with new media that has some lossless extension layer than none at all. But then it depends if the new media has a shortage of additional space that would inhibit putting both lossless and lossy streams on the same media.

Yes, backwards compatibility causes specification and implementation mess. That kind of thing is manageable across a specification or so but perhaps there should be a point at which one is allowed to break things to write a clean new spec for a new generation.
Reimar says:

November 29, 2010 at 6:20 am

I find it curious to see people argue in favour of the lossless+lossy approach for audio codecs. I had the impression that most developers are of the opinion that floating point is “better” (easier to implement and faster) than fixed point (otherwise why does FFmpeg have a floating-point AAC decoder?). If I combine that with the fact that floating-point+bit-exactness = MAJOR FAIL (see Lagarith) I’d say that makes that concept already at least a minor FAIL from the start…
Ed says:

February 8, 2011 at 11:26 am

@Kostya – Well first, Thanks for implementing RV40, i know it is a long time ago but you truly make the life of billions ( It is widely used in China ) a lot easier.

Now, I am just wondering, what are your thoughts on VP8 / WebM as a codec, and its stand against h.264. Which one would you choose?

Why hasn’t FFmpeg devs created their own Audio and Video codec? ( Patients Issues aside )
Ed says:

February 8, 2011 at 11:28 am

Oh, and what Happen to the Wavelet Snow codec?
Kostya says:

February 8, 2011 at 11:43 am

@Ed

Personally I believe VP8 is not that different in design from H.264 (and most of the differences they have are of dubious usefulness).

As for audio/video â€” yes, we have Snow but it’s impossible or at least extremely hard to improve it (mostly because of nature of wavelets) and Michael has lost interest for it long time ago.

And creating decent audio codec is not that easy either. That requires a lot of testing and finetuning and time to do that, and developers are often lack it. Also there are lots of codecs around even without ours. We have Sonic (lossless codec with optional lossy mode) which was intended as extension of existing Bonk codec (barely anyone remembers them both though) but I have not seen any further development of it. And we have lossless video codec because it was easy to create one.

But (I repeat) for proper lossy encoder a lot of work on testing approaches and tuning needed and developers usually don’t have neither time nor lasting interest. Yes, sometimes when there are pressing needs people create something like Xvid, x264 or LAME. Although in that case you have standards and support for that codec elsewhere, I doubt we can introduce something like that and without being widespread it withers and people forget about it (example: Snow).
Dmitry says:

March 9, 2012 at 9:34 pm

Hi,

Just wanted to thank you from the bottom of my heart for writing the great aac codec for ffmpeg! I went nuts trying to solve the audio/video out of sync problem, that I was experiencing by using libfaac with certain videos. None of the solutions worked (-async, -vsync) and when I tried aac it gave me a perfect output straight on the first attempt! Great work!