Archive for July, 2010

Standards: Video versus Audio

Wednesday, July 21st, 2010

Since I work on multimedia stuff, I had some chances to look at different specifications for audio codecs as well as video ones. And comparing those specifications led me to rather surprising conclusion:

Video codec specifications are better than audio ones

I admit that I might miss some details or interpret something wrong yet here’re my impressions.

Video codec specifications tend to be complete and while they are not always easy to comprehend you can write codecs after them (well, maybe in VP8 case you need to add some glue code to reference decoder disguised as specification). And those specs usually cover everything, including extensions and rather freely obtainable (usually drafts are good enough for all practical purposes). I know mostly ITU H.26x, MPEG video and SMPTE VC-1 codecs but that seems to apply to all of them.

Audio codec specifications often lack those features. Looks like they offer you more or less complete version of core decoder and good luck finding description of extensions. And even then they manage to lie or omit some things.

MPEG Audio Layers I-III — no objections (except for Layer 3 bitstream design, of course).

AAC — nothing wrong with core bitstream parsing. Heck, I even managed to write something that produces valid AAC stream (sometimes) that can be decoded into recognisable sound (with a bit of luck). Now look at the extensions — they are very poorly documented. That was the reason why FFmpeg got AAC SBR and AAC PS support that late.

ATSC A/52 (some call it AC3) — again, nothing wrong with core decoder. FFmpeg even got encoder for it long before native decoder (you can blame liba52 for that too). But E-AC-3 is a different beast. It’s mostly okay unless you try implementing enhanced coupling or dependent streams. The former has some bugs so implementing it right by the specification will result in decoder failing to parse bitstream sometimes and the latter is almost fine but in version available in the wild there’s no mention that some of extension channels are actually channel pairs. Good luck figuring it out by yourself.

DCA — it was fun when I discovered that actual CRC calculation is performed with polynomial different from the one given in specification. Luckily nobody bothers about it. Some of the tables are not given in specification — DCA is the codec with the largest amount of tables if you count its extensions (TwinVQ is a close competitor though), so they decided not to give tables with more than 1024 elements in specification. You need reference code for that. And good luck finding specifications for extensions. And I assure you that like in case with E-AC-3 reference decoder sometimes does different things than written in spec. The most wonderful part? Your decoder should be bitexact if you want lossless mode operating properly and looks like the only way to do that is to copy a lot of stuff from reference decoder.

Speech codecs (AMR-NB, AMR-WB, ITU G.72x, RFC xxxx) — some of them nice but I still have that impression that most of them had specifications written by the next formula: “Here’s how it should operate in principle, I don’t remember exact detail anyway but we need to write something here, you have your reference decoder source so bugger off”. I remember looking at
those G.72x specs (some are quite nice though), I remember troubles Robert Swain had with AMR-NB decoder (I tried to help him a bit with it too) and there’s some speech codec (iLBC?) that simply dumped all source code files into RFC body without much explanation.

That’s why I claim that audio codec specifications are generally worse than video codec specs. I think if somebody ran simple test on specs assigning penalty points for things like:

  • containing non-obvious pseudocode constructs like QTMODE.ppq[nQSelect]->inverseQ(InputFrame, TMODE[ch][n])
  • containing five-line pseudocode for what can be expressed in one clear sentence
  • containing source code listings (bonus points for spanning more than one page)
  • omitting details essential for implementation
  • lying about details (i.e. when reference decoder and specification disagree)
  • assigning decoder to do irrelevant tasks (like upsampling, postfiltering and such)

virtually no audio codec would have zero score unlike video codecs.

German Transport

Thursday, July 15th, 2010

Looks like people expect rants about transport from me. OK, here’s what should be the last in that series — regional and urban rail transport.

There are several types of rail transport:

  • Trams (Strassenbahn)
  • Commuter trains (S-Bahn)
  • Underground (U-Bahn)

In theory, trams go in cities on the ground, U-Bahn goes under the ground and S-Bahn goes to suburbs and S-Bahn trains look like this.

But during my travels I’ve seen it’s not completely true.


This city has a proper system with all three components, nothing peculiar at all.


Proper trams and S-Bahn, nothing peculiar. But Neckar valley views are impressive.


Proper S-Bahn but their U-Bahn reminds me of trams for some reason. They have underground trains with maximum of two carriages (or one articulated) with third rail between usual two. I heard they’re better at cars though.


This is rather small city so they have only one proper S-Bahn route — to Heidelberg and Mannheim, the rest of S-Bahn routes are served by trams, the same trams serve internal routes. Yet this network is quite extensive, I’d never believe that I can visit famous Russian resort (Baden-Baden) by tram — and that’s in 30 kilometres from Karlsruhe!


I visited Berlin to attend live IRC chat (aka LinuxTag) yet I’ve tried to look at local transport system.

U-Bahn is curious, they have two kinds of trains: narrow and not so narrow. Both seem to have the same types of trains in two different sizes though. A pleasant surprise is that it actually works even at night, not all of the lines though.

S-Bahn is actually can be described as “U-Bahn that shares some tracks with railroad”. Honestly, it’s the same third rail system as any underground and if not for the line naming (S1, S2, … versus U1, U2, …) you cannot distinguish them; even the trains are similar. And I have an impression that it does not serve much of the suburbs either.

I heard they also have trams but never seen those.