Standards: Video versus Audio

July 21st, 2010

Since I work on multimedia stuff, I had some chances to look at different specifications for audio codecs as well as video ones. And comparing those specifications led me to rather surprising conclusion:

Video codec specifications are better than audio ones

I admit that I might miss some details or interpret something wrong yet here’re my impressions.

Video codec specifications tend to be complete and while they are not always easy to comprehend you can write codecs after them (well, maybe in VP8 case you need to add some glue code to reference decoder disguised as specification). And those specs usually cover everything, including extensions and rather freely obtainable (usually drafts are good enough for all practical purposes). I know mostly ITU H.26x, MPEG video and SMPTE VC-1 codecs but that seems to apply to all of them.

Audio codec specifications often lack those features. Looks like they offer you more or less complete version of core decoder and good luck finding description of extensions. And even then they manage to lie or omit some things.

MPEG Audio Layers I-III — no objections (except for Layer 3 bitstream design, of course).

AAC — nothing wrong with core bitstream parsing. Heck, I even managed to write something that produces valid AAC stream (sometimes) that can be decoded into recognisable sound (with a bit of luck). Now look at the extensions — they are very poorly documented. That was the reason why FFmpeg got AAC SBR and AAC PS support that late.

ATSC A/52 (some call it AC3) — again, nothing wrong with core decoder. FFmpeg even got encoder for it long before native decoder (you can blame liba52 for that too). But E-AC-3 is a different beast. It’s mostly okay unless you try implementing enhanced coupling or dependent streams. The former has some bugs so implementing it right by the specification will result in decoder failing to parse bitstream sometimes and the latter is almost fine but in version available in the wild there’s no mention that some of extension channels are actually channel pairs. Good luck figuring it out by yourself.

DCA — it was fun when I discovered that actual CRC calculation is performed with polynomial different from the one given in specification. Luckily nobody bothers about it. Some of the tables are not given in specification — DCA is the codec with the largest amount of tables if you count its extensions (TwinVQ is a close competitor though), so they decided not to give tables with more than 1024 elements in specification. You need reference code for that. And good luck finding specifications for extensions. And I assure you that like in case with E-AC-3 reference decoder sometimes does different things than written in spec. The most wonderful part? Your decoder should be bitexact if you want lossless mode operating properly and looks like the only way to do that is to copy a lot of stuff from reference decoder.

Speech codecs (AMR-NB, AMR-WB, ITU G.72x, RFC xxxx) — some of them nice but I still have that impression that most of them had specifications written by the next formula: “Here’s how it should operate in principle, I don’t remember exact detail anyway but we need to write something here, you have your reference decoder source so bugger off”. I remember looking at
those G.72x specs (some are quite nice though), I remember troubles Robert Swain had with AMR-NB decoder (I tried to help him a bit with it too) and there’s some speech codec (iLBC?) that simply dumped all source code files into RFC body without much explanation.

That’s why I claim that audio codec specifications are generally worse than video codec specs. I think if somebody ran simple test on specs assigning penalty points for things like:

  • containing non-obvious pseudocode constructs like QTMODE.ppq[nQSelect]->inverseQ(InputFrame, TMODE[ch][n])
  • containing five-line pseudocode for what can be expressed in one clear sentence
  • containing source code listings (bonus points for spanning more than one page)
  • omitting details essential for implementation
  • lying about details (i.e. when reference decoder and specification disagree)
  • assigning decoder to do irrelevant tasks (like upsampling, postfiltering and such)

virtually no audio codec would have zero score unlike video codecs.

German Transport

July 15th, 2010

Looks like people expect rants about transport from me. OK, here’s what should be the last in that series — regional and urban rail transport.

There are several types of rail transport:

  • Trams (Strassenbahn)
  • Commuter trains (S-Bahn)
  • Underground (U-Bahn)

In theory, trams go in cities on the ground, U-Bahn goes under the ground and S-Bahn goes to suburbs and S-Bahn trains look like this.

But during my travels I’ve seen it’s not completely true.

Munich

This city has a proper system with all three components, nothing peculiar at all.

Heidelberg/Mannheim

Proper trams and S-Bahn, nothing peculiar. But Neckar valley views are impressive.

Stuttgart

Proper S-Bahn but their U-Bahn reminds me of trams for some reason. They have underground trains with maximum of two carriages (or one articulated) with third rail between usual two. I heard they’re better at cars though.

Karlsruhe

This is rather small city so they have only one proper S-Bahn route — to Heidelberg and Mannheim, the rest of S-Bahn routes are served by trams, the same trams serve internal routes. Yet this network is quite extensive, I’d never believe that I can visit famous Russian resort (Baden-Baden) by tram — and that’s in 30 kilometres from Karlsruhe!

Berlin

I visited Berlin to attend live IRC chat (aka LinuxTag) yet I’ve tried to look at local transport system.

U-Bahn is curious, they have two kinds of trains: narrow and not so narrow. Both seem to have the same types of trains in two different sizes though. A pleasant surprise is that it actually works even at night, not all of the lines though.

S-Bahn is actually can be described as “U-Bahn that shares some tracks with railroad”. Honestly, it’s the same third rail system as any underground and if not for the line naming (S1, S2, … versus U1, U2, …) you cannot distinguish them; even the trains are similar. And I have an impression that it does not serve much of the suburbs either.

I heard they also have trams but never seen those.

About my new work

May 30th, 2010

While I’m extremely lazy maybe it’s time to write about my new work.

Yes, I’ve finally got a job in civilised country (Germany). I admit, I’ve chosen the offer based mostly on environment and what I got:

  • my work — working on multimedia technologies implementation (codecs for now).
  • my colleagues — nice and friendly people with good knowledge so I don’t feel alone and can talk about technical stuff too.
  • my living place from consumer position — maybe not ideal but with very good infrastructure and mostly I can easily get what I want.
  • geographical position — not far from Rhine and near Schwarzwald, so I can enjoy scenic landscapes as well.

Overall conclusion: I was lucky to pick this job. Though I still miss Sweden but life is not ideal.

Some RE anecdotes

April 1st, 2010

I think it’s a good day to tell some stories about some peculiarities in codecs that may be a bit funny.

Intel Audio Coder (and maybe IMC) unpacks bitstream into integer array. Yes, one bit into 32 bits. It makes sense for codecs operating on single bits probably but looking how it tries to reconstruct variable in a loop from that bit array is hilarious. And do you know how that codec encodes sound? It loads “iacenc.dll” or something and calls encode function from it. Was it that hard to make it totally separate encoder instead of pretending it can encode but screw it when encoder library is missing?

VoxWare series of codecs (sorry, metacodecs — MetaVoice, MetaSound) features its own codec subsystem. There is main codec library which really can’t do coding or decoding. It loads actual codecs (with “.vwp” extension) and uses its own system of calls to do the work. Entry point in those overlays is appropriately named VoodooQuery.

Now to MetaSound decoder (I call it double messy because it’s named “msms01.vwp”). First, it features quite a lot of arrays of floats. I think it has the biggest relative non-zero data segment I’ve ever seen in binaries. Also it features and additional segment with code named “CODE” and it contains single function for performing FFT. And it has functions for zeroing or copying arrays of floats written quite poorly and nowhere near as good as plain memcpy/memset.

Interplay video player uses self-modifying code for pattern output. It does so by loading two colours into registers and modifying output commands to use either one of those two registers as source. For the record, another popular method is to duplicate value 4 times in the register, apply mask to one register and inverse of that mask to another register, bitwise OR them and output.

And most mind-boggling code I’ve ever seen is Discworld III game engine (which also performs FMV decoding). While REing ADPCM variants used there is relatively easy, video decoding is maybe the most obfuscated code I’ve ever seen. Let just say that decoding function avoids using stack but does a lot of indirect calls to modify register values.

325th Anniversary

March 31st, 2010

Today (new style, otherwise it’s March 21st) is 325 years since The Composer was born.

While there are many composers — gifted, talented, geniuses — there was only one Composer who wrote Music. This Music deserves a capital letter because while it was originally written for certain instruments it is actually can be played on virtually everything (and it’s de facto test for new instruments). While it was a common practice for Baroque music to re-arrange music for whatever instruments were available, this Music can be translated to be played on almost any other instrument — usually exercising all possible sounds that instrument (and player) can provide.

Here is one example (violinist test is even more obvious):

Organist proficiency test

Organist proficiency test

I’m not into music (I don’t play any musical instruments nor even know the musical notation beyond “it’s notes”) but I can value music by certain parameters. Two main factors for me are complexity (here it’s explained formally) and melody. The stuff you can hear on radio here usually lacks both. Contemporary instrumental music usually lacks one of this (yes, it’s mostly either total cacophony to my ears or melodic and quite primitive). And I don’t think it’s a coincidence that I both like to program and reverse-engineer and to listen how melodies get intertwined together, how quite simple tunes form one big piece of music, how the whole theme changes both logically and unpredictably at the same time.

Without The Composer’s Music this world would be much crappier and there would be significantly less happiness. And while 2010 is Chopin Year, every March is The Composer’s month.

A private history of prank that failed

March 27th, 2010

How would you call the following situation: a man who has never travelled alone and never been abroad suddenly appears in a foreign country. He does not know local language nor customs. And all that happens on 1st of April.

Well, sounds like a prank indeed but it were the best days of my life. I don’t remember whose idea it was but I quite suddenly decided to visit Sweden (okay, blame those subtitles in Monty Python and the Holy Grail ) and first half of April was the time when I could do that.

What’s a first thing any tourist should do in Stockholm? In my case it was wandering in Sundbyberg. I did a lot of things every tourist should do — visiting a dozen of museums (first two of them were Post Museum and Royal Coin Cabinet) and no art galleries,going to Skärholmen and not visiting any big shops there, not buying any elk toys or T-shirts with silly inscriptions, going to churches to listen to organ music, being at city centre and shopping only at two places there (Teknik Magasinet and Hemköp); as an extra I’ve been to Copenhagen for a day and have not seen any mermaids or castle interiors.

Also I’ve attended FFmpeg developers conference (Swedish branch). It was very nice to see all those people in person. One of them was very kind to give me a short tour on Stockholm (Odenplan – KTH – Stureplan – Norrmalmstorg).

I’ve lost my heart at Sweden. This country style is “simple, beautiful and effective” applied everywhere. And if you heard the saying “the way to a man’s heart is through his stomach”, it took place too (a sure sign is that when other developers’ commit messages start with “10l cola: …” mine start with “10l trocadero: …” and thinking of Emmentaler as “Swiss version of Grevé”).

And I can’t fully express my gratitude to Benjamin Larsson without whom all of this could not happen.


Enough with reminiscences, I’ll try to produce some codec material to write about.

Some Observations on Transport Infrastructure

March 25th, 2010

Today I’d like to rant about the ways transport is organised in different places I’ve visited so far.

Read the rest of this entry »

Bink samples needed

March 10th, 2010

I’m searching for old Bink samples. There are plenty “BIKi”, “BIKf” and some “BIKh” samples available but next to nothing of older ones. By pure luck we were able to find some “BIKb” but that’s all.

We are still in need for old Bink versions, anything from “BIKa” to “BIKe”. Can you help us?

Here’s that list of stuff using Bink. Looks like that games released since 2000 use “BIKf” and later, so we are hunting earlier games (and it’s because some Mike has not bothered to retrieve all that information from MobyGames).

Probably “Might & Magic VIII: Day of the Destroyer” demo may use it (release uses “BIKf” and “BIKh”), for a start. Maybe some other Heroes of Might and Magic III games have them (“Shadow of Death” addon I have features “BIKb”).

Any help will be appreciated.

Notes:

Bink versions are determined by first four bytes of file, any hex viewer can help you.

Some games (like the ones by New World Computing) may have all video files in single archive named like “videosomething.smt” or “something.vid”, sometimes along with Smacker files. But those archives usually feature file names at the beginning, again that can be easily viewed with any hex viewer. And if file resides in directory named “Video” or “Movie” it’s a good hint too.

Update: looks like there are no such files (except maybe in some archives of RAD developer(s)).

A bit about old Bink

March 7th, 2010

I don’t think you’ll ever encounter Bink video version ‘b’, known samples were dug out from game data of certain New World Computing games. And looks like they are not supported by official software anymore. But why that can stop us from looking at it?

This information is based on findings by certain FFmpeag deaveloper and me looking at disassembly for similarities with newer Bink version.

The main difference is that this version does not employ Huffman coding at all. All bundle data is stored in raw form, 4-11 bits per entry depending on bundle type. Number of elements in bundle is stored as 13-bit number, newer version uses different number of bits depending on plane width.

Also this Bink used floating-point version of DCT (but constants are the same as in integer version employed by latter codec version).

Coding methods (block types) are in totally different order as well. And 8×8 -> 16×16 scaling block type was not devised in that time either. Bundles contain slightly different data as well — for example, quantisers and number of residue masks are there but pattern run block uses diminishing number of bits to code runs instead of reading it from bundle (indeed, if you have to decode 48 more block elements you need to read 6 bits but when there are mere 7 block elements left 3 bits are enough).

I’ve not looked too close at DCT coefficient/residue coding methods but they seem to resemble those used in newer version of codec.

Short conclusion: while Bink video codec steadily improved, most concepts are remained the same (but there’s a bigger leap between versions ‘b’ and ‘f’ than between ‘f’ and ‘i’, the latter are almost backwards compatible). And maybe we’ll see decoder for it in FFmpeg for completeness sake.

It was not the codec you’re looking for

February 12th, 2010

There is only one thing that may taint the joy of REing yet another codec. It’s when you realize that most of the samples you want to decode are coded with another codecs.

While recent Indeo 5 decoder addition allows playing many files, I found out that I have more samples encoded with Indeo 4. Even though I have Bink Video decoder it looks like I don’t have much samples for it. But there are many other games with custom codecs worth REing.

Yet it’s not that bad as sounds. M$ Video 1, Cinepak, Smacker and Sierra VMD seem to cover most of the samples I have interest in. Luckily for me there are many codecs left to RE for which I have some interest. Another guy had fulfilled his dream of being able to watch movie trailers in QuickTime format, so there’s almost no new work from him.

P.S. After I’d published that “looking for a job” post, I got many proposals, but for some reason they are mostly for USA and some people asking if I’d consider Australia too (BTW, answer is no, it’s too warm place for me and I plainly can’t work in such conditions). Either I want something unrealistic (i.e. job in Europe) or it’s Murphy Law in action.