How to Design A Perfectly Awful Codec « Kostya's Boring Codec World

How to Design A Perfectly Awful Codec

A quick glance on some codec disassembly inspired me to write this post.

So today I talk about how to design perfectly awful codec (from FFmpeg decoder implementer’s point of view). Since audio and video codecs usually have some specific methods and approaches to design, it will be presented in two parts.

Video Codec Design (and why we don’t have a decoder for this codec in FFmpeg)

Don’t care about portability. The “best” example is Lagarith — lossless video codec that uses floating point variable for arithmetic coder state. Thus, decoding it on anything but x86 requires an 8087 emulator.
Tie it to specific API or OS. The codec mentioned at the beginning provides the best example: it stores actually a sequence of GDI commands for frame data. While storing, say, VNC protocol record may provide good lossless compression, it should be self-sufficient (i.e. it should not require external data). M$ Camcorder Video however has (and uses!) such wonderful commands as “draw text with provided font parameters (including font name)”. Thanks, I’m not going to work on decoder for that, ask those guys instead.
Use lots of data. It really pisses decoder developer when you have to deal with lots of tables, especially with non-obvious structure. Special thanks to RealVideo 3 and 4 which stored variable-length codes data in three ways and about a hundred of codebooks.
Use your own format. That one annoys users as well. Isn’t it nice when your video is stored in videofile.wtf that can be played only with provided player (and who knows if it can be converted at all). Sometimes this has its reasons — for game formats, for example — though this makes life of decoder developer a bit harder.

Audio Codec Design (and why nobody cares about this codec)

Let’s repeat last two items:

Use lots of data. Yes, there are codecs that use lots of tables during decoding. The best supporters of this policy are DTS (they even decided to skip tables with more than ten thousand elements in ETSI specification, extensions require few more tables) and TwinVQ/VQF that has even more tables.
Use your own format. Audio codec authors like to invent new formats that can be used only with their codecs. There is one example when such container format was extended to store other codecs as well. That’s infamous Ogg. If you think it’s nice then try implementing demuxer for it from the scratch.

But wait, there are more tricks!

Containers are overrated. The best example is Musepack SV7 and earlier. That codec is known to store frames continuously and when I say “continuously”, I mean it — if one frame ends inside byte, new frame starts from the next bit. And the only way to know frame size is to decode it. And if your file is corrupted in the middle, the rest of it would be undecodable. A mild version of this is MPEG audio layer-III which stores audio data disregarding actual frame boundaries.
Really tie codec to container. That would be Musepack SV8 now. This time they’ve designed almost sane container with only one small catch — last frame actually encodes less samples and the only way to know that would be to make demuxer somehow signal decoder number of samples to decode for each frame. If you don’t do that, you may unexpectedly get some nasty decoding errors.
Change bitstream format often. If you throw out backward compatibility you may end with many decoders needed for each case. An example is CELT — it’s still experimental and changes bitstream format often, thus storing files in that format would be just silly since next version of decoder won’t be able to read them.
Hack extensions into bitstream. Some codecs are known to contain extension data inside frame data for “backwards compatibility” so decoders usually have hard time finding it and verifying it’s really expected extension data instead of some garbage. Well-known examples are MP3Pro and DTS (which took it to extreme — there are extensions for both frequency and additional channels that can be present simultaneously; luckily, DTS-HD has it more structured inside an extension frame data).
Make it unsuitable for general uses. For example, make codec take unbounded or potentially too large amounts of memory (Ogg Vorbis does that) or
Make codec like a synonym for reference implementation. It’s good when you just make only one implementation and just change it in many subtle ways so later you need to reverse engineer the source to get specification. That was the case with binary M$ Office formats and it seems to be the case with Speex (at least I heard so).

And finally, The Lossless Audio Codec to serve an example to them all. As Måns put it, “wherever your talk about bad design of codecs, there’s always Monkey’s Audio”. Let’s see its advantages:

container — it has custom container of course. And there’s one minor detail: it packs data into 32-bit little-endian words and frame may start at any byte of that word. This makes it somehow combine both approaches to containers.
Bitstream format changes — check. It is known to have a lot of small tweaks making bitstream format incompatible. Some of them are actually container-related though.
Unusable for general uses — well, it’s famous for requiring more CPU power to decode than most low-resolution (up to SD) H.264 video streams.
One codec, one implementation — for a long time it was so until some Rockbox developer REd the source code and wrote his own decoder (FFmpeg decoder is derived from it). Also for quite a long time it was supported only on Windows. And it doesn’t support older versions — nobody bothered to extend support for them.

This entry was posted on Saturday, November 13th, 2010 at 12:57 pm and is filed under Useless Rants. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

3 Responses to “How to Design A Perfectly Awful Codec”

mike says:

November 13, 2010 at 5:26 pm

One more audio related one: Support concatenating files without updating the headers. Ogg says you can concatenate multiple streams into one file, even ones with different table sizes for constants. So you’re supposed to be able to reallocate all your buffer sizes (which of course as you mentioned above have no fixed bounds on size to begin with!) mid way through decoding a file. That works real well on portable audio players that don’t have an MMU and so can’t actually realloc things . . .
Multimedia Mike says:

November 17, 2010 at 9:11 pm

Excellent rant– I think that covers most of the big complaints.

The Lagarith/x87 issue reminds me that it’s not entirely uncommon to have to emulate bits of hardware to make certain multimedia play. Examples: 6-bit VGA RGB palette components; non-interleaved stereo PCM; sign-magnitude PCM; basing playback timing (for both audio and video formats) on specific computers’ display timing. But I suppose those are relatively easy to emulate in software.

I remember Lagarith having another issue: The primary implementation (which was graciously open source) had large swaths of code that were written in x86 SIMD with no C-only counterpart implementations.
Jean-Marc Valin says:

February 25, 2011 at 5:57 pm

“An example is CELT â€” itâ€™s still experimental and changes bitstream format often, thus storing files in that format would be just silly since next version of decoder wonâ€™t be able to read them.”

I’m sorry, but this is the case of every single codec out there. No codec I know of was written in one day, hence the bitstream format kept changing for a while. The only difference with CELT is that being open-source, we’ve actually let people try it and provide feedback instead of keeping it to ourselves until we decided it was done. The web page makes it clear:

“Since CELT is still in development, most new releases (even minor ones) change the bit-stream, so compatibility is not preserved.”

Some people decided to use it in production because they didn’t need to stable bitstream, but they were certainly warned.