Kostya's Boring Codec World

Again about my favourite country

January 9th, 2011

There’s Russian saying “You’ll live a year in a way you meet it”, so just in case I tried to get to my favourite country. Mostly to check how it’s in winter time.

If somebody still thinks I’m normal, here’s a picture of what I mostly drink during my stay there:

(those are different bottles, I’ve tried more few more kinds too). In addition to that I’ve tried few more drinks from certain brewery i Sundsvall. And I’ve finally tried Norrlands national drink in solid form. Somehow these trips always get full support from my stomach.

P.S. Ost, inlagd sill, gravad lax, köttbullar och tunnbröd med julskinka är lagom bra. Och jag har älgkorv att prova.

P.P.S. Trains there seem to be as punctual as in Germany.

Posted in Useless Rants | 5 Comments »

The biggest curse in codec design

November 28th, 2010

This post is an answer to the comment by Alex Converse on my previous post:

It’s interesting how quickly you dismiss SLS for being a hybrid coder with AAC. From a pure lossless standpoint that is a weakness but from a broader perspective it allows for a lossy layer that is widely compatible with existing hardware.

Let’s see why scalable coding is a weakness from lossless coding standpoint.

There are few hybrid lossy+lossless codecs out there which use lossy part in lossless reconstruction, e.g. MPEG-4 SLS, DTS-HD MA and WavPack. First two use two different coding techniques – MDCT or QMF for core coding and usual lossless coding for difference. In WavPack both parts are coded in the same way and correction data is stored in different block. For DCT-based codecs there are many ways of performing DCT (from trivial matrix multiplication to FFT-based implementation to decomposing DCT into smaller size DCTs) which may lead to slightly different output depending on method chosen. Thus, you should have a reference way (i.e. not the fastest one) of doing lossy stuff or you can’t guarantee truly lossless reconstruction. Also residue (the difference between original and lossy coded signal) tends to be more chaotic in this case and thus less compressible.

Another issue is what to do with correction data. If you put it into a separate file, you will have more troubles since you have to manage two files; if you put it all into single file, its size will be bigger than pure lossless coded file (unless you have very stupid method of lossless coding).

And now comes an argument that I really hate: “but it allows legacy players handle those files”. That, in my opinion, is this post title. Making it backward compatible just cripples it. In that case you need to implement new (and sometimes completely different features) in old limits and relying to new technology. So in some case it just degrades quality and/or forces you to encode something twice — for old feature set and its replacement. Another reason is that it just delays that codec adoption: old player can play it so why should I bother about this new codec support? I suspect this was a reason why we have MLP but no DTS-HD support.

The worst offender here is MP3. This codec sucks by design. It uses 36-point (or three 12-point) MDCTs which are not trivial to speed-up unlike power-of-two transforms and the output of MDCTs is used as input to QMF used in MPEG Audio layers I&II, as it claimed “to be compatible with them”. As claimed here, MP3 would perform better and since it comes from one of the leading LAME developers, I believe it. And of course MP3Pro. Most players in existence just ignore extension part and play crippled version of sound. Someone may argue that’s because it’s proprietary. Okay, look at HE-AAC where SBR is documented at least, it may still cause some confusion since it may be detected only when decoding audio frame.

In my opinion both implementing new codec support and special codec extension in general case is just single-time action with comparable effort (hacking existing code for new extension support and detection may be not that easy). And thus, adding a new codec should be preferred. MPEG-2 introduced both AAC (please look how it was called back then) and multichannel extensions to layers I-III. Guess which one works better?

Posted in Useless Rants | 7 Comments »

Why Lossless Audio Codecs generally suck

November 27th, 2010

Why there are so many lossless audio codecs? Mike, obviously, had his thoughts on that subject and I agree with my another friend who said: “it’s just too easy to create lossless audio codec, that’s why everybody creates his own”.

Well, theory is simple: you remove redundancy from samples by predicting their values and code the residue. Coding is usually done with Rice codes or some combination of Rice codes and an additional coder — for zero runs or for finer coding of Rice codes. Prediction may be done in two major ways: FIR filters (some fixed prediction filters or LPC) or IIR filters (personally I call those “CPU eaters” for certain property of codecs using it). And of course they always invent their own container (I think in most cases that’s because they are too stupid to implement even minimal support for some existing container or even to think how to fit it into one).

Let’s iterate through the list of better-known lossless audio codecs.

ALAC (by Apple) — nothing remarkable, they just needed to fit something like FLAC into MOV so their players can handle it
Bonk— one of the first lossless/lossy codecs, nobody cares about it anymore. Some FFmpeg developers had intent to enhance it but nothing substantial has been done. You can still find that “effort” as Sonic codec in libavcodec.
DTS-HD MA — it may employ both FIR and IIR prediction and uses Rice codes but they totally screwed bitstream format. Not to mention there’s no openly available documentation for it.
FLAC — the codec itself is good: it’s extremely fast and features good compression ratios. The only bad thing about it is that it’s too hard to seek properly in it since there’s no proper frame header and you can just hope that that combination of bits and CRC are not false positive.
G.711.0 — have you ever heard about it? That’s its problem: nobody cares and nobody even tries to use it.
MLP/Dolby True-HD — it seems to be rather simple and it exists solely because there was no standardised lossless audio codec for DVD.
Monkey’s Audio — well, the only good thing about is that it does not seem to be actively developed anymore.
MPEG-4 ALS — the same problem: it may be standardised but nobody cares about it.
MPEG-4 SLS — even worse since you need bitexact AAC decoder to make it work.
OggSquish — luckily, it’s buried for good but it also spawned one of the worst container formats possible which still lives. And looking at original source of it one should not wonder why.
RealAudio Lossless Format — I always say it was named after its main developer Ralph Wiggum. This codec is very special — they had to modify RM container format specially for it. A quick look inside showed that they use more than 800 (yes, more than eighty hundred) Huffman tables, most of them with several hundreds of codes (about 400 in average). That reminds me of RealVideo 4 with its above-the-average number of tables for context-dependant coding.
Shorten — one of the first lossless audio codecs. Hardly anyone remembers it nowadays.
TAK — it was originally called YALAC (yet another lossless audio codec) for a reason. Since it’s closed-source and fortunately not widespread (though some idiots use it for CD rip releases), it just annoys me time from time but I don’t think someone will work on adding support for it in FFmpeg.
TrueAudio (TTA) — I can say anything about it except it seems to be quite widespread and it works. Looks like they’re still alive and work on TTA2 but who cares?
WavPack — that’s rather good codec with sane bitstream format too. Looks like its author invested some time in its design. Also he sent patches to implement some missing features in our decoder (thank you for that!).
WMA Lossless — from what I know, it uses IIR filter based on least minimum squares method for finding its coefficients. It has two peculiarities: that filter is also used for inter-channel decorrelation and bitstream format follows WMA9 format, i.e. it has something like interframes and frame data starting at arbitrary point (hello, MP3!).

P.S. I still hope this post won’t encourage anybody to write yet another useless lossless audio decoder.

Posted in Audio, Lossless Audio, Useless Rants | 8 Comments »

Maybe the last word about Bink version b

November 20th, 2010

This codec is a collaborative effort — both me and some bogan from Melbo have been slacking on it for quite a long time.

Strewth, it’s almost the same as its successor, the real Bink known everywhere (Bink-b or 0.5 is not even mentioned in official Bink history). The main differences are lack of Huffman coding (all bundle elements are just stored in predefined number of bits), different scan-run coding (instead of storing it in a separate bundle, run values are stored in bitstream with minimum number of bits needed to code the biggest run from that point, i.e. 6 bits at the beginning but less than five bits for runs in the second half of block) and DCT uses floating-point coefficients (though the same ones for all practical purposes).

The only thing that differs significantly is motion compensation. Bink version b seems to use the same frame for all coming decoding and actually motion compensation in the first frame copies already decoded block from the same frame. After discovering that fact I was able to obtain perfect first frame in many cases and sometimes the rest of video was also decoded fine except for few glitches. The only puzzling thing is that vertical motion vector offset in the first frame seems to have slightly different value, so <-8,15> actually means “copy data from the previous left block” and <0,7> means “copy data from the top block” while such translation is not needed for all next frames.

Since all known samples are from the Heroes of Might and Magic III game and they are duplicated with Smacker samples, there’s not much interest on finishing that decoder and integrating it into current FFmpeg Bink decoder (I’ve done it as a dirty hack). So no prawn cracker for you, mate.

Update: proof-of-a-concept hack (produces minor artifacts too) can be downloaded here.

Posted in Bink, Game Video | 3 Comments »

How to Design A Perfectly Awful Codec

November 13th, 2010

A quick glance on some codec disassembly inspired me to write this post.

So today I talk about how to design perfectly awful codec (from FFmpeg decoder implementer’s point of view). Since audio and video codecs usually have some specific methods and approaches to design, it will be presented in two parts.

Video Codec Design (and why we don’t have a decoder for this codec in FFmpeg)

Don’t care about portability. The “best” example is Lagarith — lossless video codec that uses floating point variable for arithmetic coder state. Thus, decoding it on anything but x86 requires an 8087 emulator.
Tie it to specific API or OS. The codec mentioned at the beginning provides the best example: it stores actually a sequence of GDI commands for frame data. While storing, say, VNC protocol record may provide good lossless compression, it should be self-sufficient (i.e. it should not require external data). M$ Camcorder Video however has (and uses!) such wonderful commands as “draw text with provided font parameters (including font name)”. Thanks, I’m not going to work on decoder for that, ask those guys instead.
Use lots of data. It really pisses decoder developer when you have to deal with lots of tables, especially with non-obvious structure. Special thanks to RealVideo 3 and 4 which stored variable-length codes data in three ways and about a hundred of codebooks.
Use your own format. That one annoys users as well. Isn’t it nice when your video is stored in videofile.wtf that can be played only with provided player (and who knows if it can be converted at all). Sometimes this has its reasons — for game formats, for example — though this makes life of decoder developer a bit harder.

Audio Codec Design (and why nobody cares about this codec)

Let’s repeat last two items:

Use lots of data. Yes, there are codecs that use lots of tables during decoding. The best supporters of this policy are DTS (they even decided to skip tables with more than ten thousand elements in ETSI specification, extensions require few more tables) and TwinVQ/VQF that has even more tables.
Use your own format. Audio codec authors like to invent new formats that can be used only with their codecs. There is one example when such container format was extended to store other codecs as well. That’s infamous Ogg. If you think it’s nice then try implementing demuxer for it from the scratch.

But wait, there are more tricks!

Containers are overrated. The best example is Musepack SV7 and earlier. That codec is known to store frames continuously and when I say “continuously”, I mean it — if one frame ends inside byte, new frame starts from the next bit. And the only way to know frame size is to decode it. And if your file is corrupted in the middle, the rest of it would be undecodable. A mild version of this is MPEG audio layer-III which stores audio data disregarding actual frame boundaries.
Really tie codec to container. That would be Musepack SV8 now. This time they’ve designed almost sane container with only one small catch — last frame actually encodes less samples and the only way to know that would be to make demuxer somehow signal decoder number of samples to decode for each frame. If you don’t do that, you may unexpectedly get some nasty decoding errors.
Change bitstream format often. If you throw out backward compatibility you may end with many decoders needed for each case. An example is CELT — it’s still experimental and changes bitstream format often, thus storing files in that format would be just silly since next version of decoder won’t be able to read them.
Hack extensions into bitstream. Some codecs are known to contain extension data inside frame data for “backwards compatibility” so decoders usually have hard time finding it and verifying it’s really expected extension data instead of some garbage. Well-known examples are MP3Pro and DTS (which took it to extreme — there are extensions for both frequency and additional channels that can be present simultaneously; luckily, DTS-HD has it more structured inside an extension frame data).
Make it unsuitable for general uses. For example, make codec take unbounded or potentially too large amounts of memory (Ogg Vorbis does that) or
Make codec like a synonym for reference implementation. It’s good when you just make only one implementation and just change it in many subtle ways so later you need to reverse engineer the source to get specification. That was the case with binary M$ Office formats and it seems to be the case with Speex (at least I heard so).

And finally, The Lossless Audio Codec to serve an example to them all. As Måns put it, “wherever your talk about bad design of codecs, there’s always Monkey’s Audio”. Let’s see its advantages:

container — it has custom container of course. And there’s one minor detail: it packs data into 32-bit little-endian words and frame may start at any byte of that word. This makes it somehow combine both approaches to containers.
Bitstream format changes — check. It is known to have a lot of small tweaks making bitstream format incompatible. Some of them are actually container-related though.
Unusable for general uses — well, it’s famous for requiring more CPU power to decode than most low-resolution (up to SD) H.264 video streams.
One codec, one implementation — for a long time it was so until some Rockbox developer REd the source code and wrote his own decoder (FFmpeg decoder is derived from it). Also for quite a long time it was supported only on Windows. And it doesn’t support older versions — nobody bothered to extend support for them.

Posted in Useless Rants | 3 Comments »

Why I love Sweden

September 18th, 2010

I was lucky to have a short visit to my homeland (look at this blog title if you didn’t guess it) and since some people ask why I love it, I decided to write this blog post.

Disclaimer: this is my highly subjective opinion on why Sweden is the best country (for me).

There are several points which I present below.
Read the rest of this entry »

Posted in Useless Rants | 2 Comments »

On Names, Sacred Animals and Germany

August 22nd, 2010

This post is inspired by the fact that in these two days I’ve passed by two towns named Eberbach. For people without German knowledge: this name translates as “Wild Boar’s Spring”. For people with French knowledge — feel free to laugh that the second town is actually called Eberbach(Fils).

It may be insignificant name for you but not for me. There is no official animal for Ukraine, but if you ask people, most chances it will be pig (like bear for Russia). Thus, one may ask “do they name any towns or villages after it in Ukraine?”. Guess what? I’m aware of only one village named “??????”, which can be translated as “Little Boar (village)” or this (hopefully). In the same time (if German Wikipedia doesn’t lie) there’s maybe half a dozen of Eber… towns in Germany (mostly in Baden-Wuerttemberg and Bavaria) and one Schweinfurt (i.e. “Swine Ford”, also in Bavaria).

Pig is the only animal that Tatars and Turks didn’t take during their raid on Ukrainian territory, that’s why Ukrainian population should have very reverent position to pigs (and if you ask, say, Russians you’ll hear that Ukrainians are obsessed with “salo” which is obtained from pigs). Despite that there’re no towns named after it and only two or three monuments to pigs in whole Ukraine (IIRC all of them were installed in last ten years or so). It’s definitely a shame and may partially explain why Ukraine is in such deep you-know-what.

Posted in Useless Rants | 1 Comment »

Short Tourist Guide to Germany

August 17th, 2010

Since I stuck in Germany for a while, I like to explore nearest places (though I’d better go see Malmö, Sundsvall, Uppsala and, of course, Stockholm). Here are my impressions after visiting a dozen of different towns.

Due to my location I mostly visited places situated near Rhine, so my sampling may be biased.

My first observation is that smaller towns are usually better (look more beautiful, nicer and more comfortable) than larger ones. For example, in Baden-Württemberg I’ve seen: Baden-Baden, Bruchsal, Donaueschingen, Eberbach, Freiburg, Heidelberg, Karlsruhe, Mannheim, Offenburg, Pforzheim, Stuttgart. Among those I liked Mannheim and Stuttgart the least. Guess what are two biggest cities in this region? Stuttgart also serves for a capital. It’s famous for one guy who invented bicycle and one of the first to invent typewriter (what do you know, Germans still love bikes) and several other guys inventing car (there are still Daimler-Benz and Porsche works in Stuttgart along with their own museums).

Köln (aka Cologne) is the fourth biggest city in Germany. While it has very impressive architecture and interesting museums, it’s queer. Seeing slogan “From Cologne to Cleveland. Gay games 2014” does not improve my impressions. Seriously, Cleveland? It’s rumoured to be one of the worst cities in USA. I’m pretty sure that, for example, Bonn is better despite being former governmental place. Also I’ve heard that they have special dialect, even less understandable than Schwabish (not a problem for me, I can’t understand any German dialect anyway).

Rhineland-Pfalz (aka Rhineland-Palatinate) has no big cities, its capital Mainz has around two hundred thousand people but those towns are beautiful! Well, except Ludwigshafen unless you like concrete panel buildings and chemical industry. Also it’s worth reminding that Mainz was a cradle of one of the most important inventions ever – printing.

I’ve been in Hamburg for about an hour but this (the second largest German city) is also not impressive.

Well, guess what German city sucks most in my opinion (and not only because it’s built on a swamp)? Of course, it’s Berlin. The only good place I’ve seen there is musical instruments museum. The rest looks a lot like Kiev – skyscrapers in the city centre and mostly neglected buildings in other areas. Not to mention that even S- and U-Bahn stations look too spartan and underdesigned. Makes you think that West Berlin was a myth.

The only major exception is Bavaria (even some Germans consider it to be a separate country and not a real part of Germany). They make good cars, they have wonderful tourist attractions, they have very good music (though Wagner was born in Leipzig), they have wonderful nature too. They even had special Monty Python ~~Flying Circus~~Fliegender Zirkus series filmed there, it’s hard to beat that.

I still have to visit Central and East Germany but I don’t think it will change my opinion. And maybe I’ll have a chance to compare Strasbourg to Paris. I suspect result will be quite similar.

Posted in Useless Rants | 2 Comments »

VC-1 interlaced: sine-wave expectations

August 8th, 2010

Indeed, my expectations on completing at least field-interlaced VC-1 support can be represented as (possibly modulated) sine wave.

So here’s a short log of my mood during my attempts on implementing it:

initial state — no feelings at all
discovered old unfinished patch — mood goes up
tried to make it decode more than one field — fail, mood goes down
found out that first frame is actually composed of I and P field — mood goes up
looked at decoded picture — mood goes down
“that P-frame structure differs a bit but that’s all” — mood goes up
read about actual motion compensation routine for interframes and related bitreading —can you guess the consequences?

Some may argue this may be better represented with triangle or sawtooth wave though.

Seriously, now I understand why some people consider interlaced video to be evil. First of all, it’s an artefact of bandlimited era. Thus it adds unnecessary complexity to modern progressive systems.

I’m pretty sure there are people who will cry when they hear about interlaced coding and coded field order. There may be people who wince at word “telecine”. There may be H.264 interlaced modes (yes, several of them, MBAFF seems to be most popular) decoder implementers. Probably I’ll join one of those groups too.

Seriously, I consider adding interlaced mode (at least to some codecs) an offence against humanity.

I don’t see why interlaced decoding must differ from progressive one that much. Okay, we have two fields and we may need to select better reference for one of them. No problem. Select two references for motion vector prediction (which is described as several pages of blah-blah-code, yes, that comprehensible)? Gentlemen, include me out!

To make things worse they decided to complicate motion vector encoding along with prediction. Honestly, one should suspect that field MVs should be smaller due to fields having half of original picture height; in reality there is an additional group of bits read to augment motion vector. Why?

And a bit of icing. FFmpeg seems not to be adapted well for interlaced decoding. For instance, who knew that you should use picture->linesize[0] instead of mpegenccontext->linesize because the former will be used in calculating offsets for current block data and if you set mpegenccontext->picture_structure to say PIC_TOP_FIELD it will modify something for you? Not to mention how to deal with motion block data needed for reference (I honestly have no idea how well it will work).

Thus, I invite somebody able and fearless to finish this task. I don’t have any samples to interest me (for the reference, in the best times my DVD collection was around two or three discs, guess the number of Blu-Rays I own) and I found better ways to spend my time (and probably more interesting codecs as well).

P.S. Moving VDPAU chunks to dedicated AVHWAccel is also needed and is trivial even for somebody without deep FFmpeg knowledge.

Posted in Useless Rants, VC-1 | 6 Comments »

Standards: Video versus Audio

July 21st, 2010

Since I work on multimedia stuff, I had some chances to look at different specifications for audio codecs as well as video ones. And comparing those specifications led me to rather surprising conclusion:

Video codec specifications are better than audio ones

I admit that I might miss some details or interpret something wrong yet here’re my impressions.

Video codec specifications tend to be complete and while they are not always easy to comprehend you can write codecs after them (well, maybe in VP8 case you need to add some glue code to reference decoder disguised as specification). And those specs usually cover everything, including extensions and rather freely obtainable (usually drafts are good enough for all practical purposes). I know mostly ITU H.26x, MPEG video and SMPTE VC-1 codecs but that seems to apply to all of them.

Audio codec specifications often lack those features. Looks like they offer you more or less complete version of core decoder and good luck finding description of extensions. And even then they manage to lie or omit some things.

MPEG Audio Layers I-III — no objections (except for Layer 3 bitstream design, of course).

AAC — nothing wrong with core bitstream parsing. Heck, I even managed to write something that produces valid AAC stream (sometimes) that can be decoded into recognisable sound (with a bit of luck). Now look at the extensions — they are very poorly documented. That was the reason why FFmpeg got AAC SBR and AAC PS support that late.

ATSC A/52 (some call it AC3) — again, nothing wrong with core decoder. FFmpeg even got encoder for it long before native decoder (you can blame liba52 for that too). But E-AC-3 is a different beast. It’s mostly okay unless you try implementing enhanced coupling or dependent streams. The former has some bugs so implementing it right by the specification will result in decoder failing to parse bitstream sometimes and the latter is almost fine but in version available in the wild there’s no mention that some of extension channels are actually channel pairs. Good luck figuring it out by yourself.

DCA — it was fun when I discovered that actual CRC calculation is performed with polynomial different from the one given in specification. Luckily nobody bothers about it. Some of the tables are not given in specification — DCA is the codec with the largest amount of tables if you count its extensions (TwinVQ is a close competitor though), so they decided not to give tables with more than 1024 elements in specification. You need reference code for that. And good luck finding specifications for extensions. And I assure you that like in case with E-AC-3 reference decoder sometimes does different things than written in spec. The most wonderful part? Your decoder should be bitexact if you want lossless mode operating properly and looks like the only way to do that is to copy a lot of stuff from reference decoder.

Speech codecs (AMR-NB, AMR-WB, ITU G.72x, RFC xxxx) — some of them nice but I still have that impression that most of them had specifications written by the next formula: “Here’s how it should operate in principle, I don’t remember exact detail anyway but we need to write something here, you have your reference decoder source so bugger off”. I remember looking at
those G.72x specs (some are quite nice though), I remember troubles Robert Swain had with AMR-NB decoder (I tried to help him a bit with it too) and there’s some speech codec (iLBC?) that simply dumped all source code files into RFC body without much explanation.

That’s why I claim that audio codec specifications are generally worse than video codec specs. I think if somebody ran simple test on specs assigning penalty points for things like:

containing non-obvious pseudocode constructs like QTMODE.ppq[nQSelect]->inverseQ(InputFrame, TMODE[ch][n])
containing five-line pseudocode for what can be expressed in one clear sentence
containing source code listings (bonus points for spanning more than one page)
omitting details essential for implementation
lying about details (i.e. when reference decoder and specification disagree)
assigning decoder to do irrelevant tasks (like upsampling, postfiltering and such)

virtually no audio codec would have zero score unlike video codecs.

Posted in Useless Rants | 3 Comments »

Kostya's Boring Codec World

Again about my favourite country

The biggest curse in codec design

Why Lossless Audio Codecs generally suck

Maybe the last word about Bink version b

How to Design A Perfectly Awful Codec

Why I love Sweden

On Names, Sacred Animals and Germany

Short Tourist Guide to Germany

VC-1 interlaced: sine-wave expectations

Standards: Video versus Audio

Pages

Archives

Categories

Another Fine Blogs

Multimedia Projects

My E-mail

Meta