Some Information on Micronas SC4 and VoxWare MetaSound

April 24th, 2016

So I’ve looked at them.

Micronas SC4 seems to be rather unusual as it seems to bring elements of LPC to ADPCM. So it’s not just the old conventional “get nibble, multiply by step, output prediction, update index and step values”—it keeps a history of last 6 decoded samples and predictions and use them to calculate a new prediction value. Details might appear in the Wiki one day.

VoxWare MetaSound is three families of 2-3 codecs bundled under the same brand. I’ve not looked at technical details but they seem to have lots and lots of tables with floating point numbers (or just a bit of tables if you’ve looked at MetaSound first).
Here are the codecs:

  • RT24 2400bps “Real-Time” codec (ID is VOXa)
  • RT28 2844bps “Real-Time” codec (ID is VOXh)
  • RT29 2978bps “High Quality” codec (ID is VOXg)
  • VR12 1260bps Variable Rate codec (ID is VOXb)
  • VR15 1537bps Variable Rate codec(ID is VOXc)
  • SC3 3200bps “Embedded” codec (no ID)
  • SC6 6400bps “Embedded” codec (no ID)

Ask for support by grabbing j-b and demanding it to be supported. I know there are other players beside VLC but that’s the only project advertising that it “plays it all” even on T-shirts. It’s time to be responsible for your own words. And ask for Bink2 too while at it.

General Thoughts about Reverse Engineering Speech Codecs

April 23rd, 2016

Spoiler: they are not nice.

Speech codecs are probably the worst from my REing point of view. Why? Not because they are particularly hard to RE but rather because they are unpleasant. Here’s my list of reasons:

  • They are math-heavy. Of course you need to know mathematics to understand most of the codecs but I had no DSP courses at the university yet I can understand how video codecs work even in fine details, the same with many audio codecs. With speech codecs I have only general ideas about how they work.
  • Even worse, there are hardly any conventions on how to do things and in result codecs are built in a process that puts designing ARM SoCs to shame.
  • Even worse, because codecs are math heavy they have to be implemented with efficiency in mind which results in horrible fixed-point math usually in 16-bit variables.
  • And because all of this wasn’t enough, if codec supports several bitrates it might have additional postprocessing functions for lower bitrates in the best case. In the usual case it’s a different codec.

And that all is the source of REing problems. Bitstream format is usually easy to find and parse, the functions that do something with it are not easy to understand at all. I often end up not understanding what the function does let alone what concept it implements. I might recognize only some of them like LPC filter.

So why are speech codecs so badly designed? In my opinion it comes from the design decision. The initial idea was to use as little bits as possible by having a synthetic model and transmitting only its parameters. It worked great (at least compared to MPEG-4 with its synthetic scene and audio description aka the key parts of standard that people pretend do not exist at all). So you have human throat which is basically a variable tube and vowels are modulated tone, consonants are modulated noise. Transmit filter coefficients (original LPC, LSF, parcor form or something else), noise flag and pitch frequency and you’re done, right? It works fine for some sounds but the quality is not that great and it fails with some sounds completely (the sounds often used by French for example, so there’s still a need for French Speech Codec or j-bc for short). How to improve it? By adding impulses to “excite” the model (i.e. tell when to start/stop the sounds). Not good enough? Add pitch tilting! Still not good enough? Add postprocessing filter (and it was mandatory there long before video codecs). And what if we want to code not just voice and higher frequency range audio too? Well, add…

And thus it became pile of hacks over pile of hacks with a side dish of hacks. And each stage can be done in several different ways which only adds to confusion. That’s not even starting to talk about smart ways to save bits by splitting frame into several subframes and omitting coding some information for selected subframes (it can be interpolated from other subframes information after all). Or using codebooks and vector quantisation. Or how to generate noise for silent frames. Or using better coding than just writing fixed amount of bits for every element. Or…


I’ve finished looking at Lernout&Hauspie CELP+SBC codecs (as usual I don’t understand most of the things they do there but maybe I’ll still document them) and this plus my past experience made me write this post. Next is VoxWare MetaVoice and maybe Micronas SC4. And something saner afterwards. Or maybe it will be the usual nothing.

On Opensource Projects Support

April 22nd, 2016

Today I’d like to talk about how opensource projects are supported by “community” on example of FFmpeg/Libav.

Obviously I’ve chosen this example because I know some facts about internal politics and the fact libavcodec is de facto multimedia decoding standard used on every platform and by most multimedia processing tools out there.

What good did it bring to the project(s)? Some fame but that’s probably it—the largest users don’t even bother to acknowledge in public that they use it (cough, BaidUTube, cough). There’s an enormous amount of code (that serves as a good compiler suite too) but it’s maintained mostly by volunteers and people who have to use it at work (as they were hired because they worked on it in the first place). That’s it: the best material gain is an employment because you’ve showed off your skills or you can take occasional consulting work with varying quality of tasks and pay. Some people are paid to improve or write new decoder. Some are hired to work on improving protocol support and hardly paid at all (true story). Some simply sigh at yet another “please implement this for my app” mail. That’s good but where are the money to pay for task the project itself (e.g. refactor old horrible code, add tests, implement new feature or fix some old problem)?

There was an attempt to set up a foundation to gather money and use them for the project but it didn’t work well even before the split and got completely derailed after; and of course it was not a good idea to set it up in the USA since IRS refused to recognize it as non-profit organisation as other open-source projects have experienced (including X.org). The best part is how much money it could raise—I don’t remember the actual sum but it was relatively low like less than $20,000 (please correct me if I’m wrong) and it came mostly from caught (L)GPL violators IIRC.

Let’s take a successful opensource project that uses libavcodec, that would be VLC. For example, last VideoLAN Developer Days definitely costed tens and tens of thousands ${proper_currency} — for accommodating about hundred of people, reimbursing (at least some of) their travel costs, food etc etc; the event was sponsored by the largest Internet advertising company, the largest French advertising company and the largest French VideoLAN advertising company. I remember talks that they could even employ one developer full-time. And would VLC be any useful without libavcodec? And while they probably are the biggest opensource supporter of FFmpeg (they host their Git repository after all) and their developers write some code time from time they hardly do anything else—there is a bounty program but it’s a complete joke since it lists mostly done tasks nobody will claim reward for (and despite me pointing them to that fact nothing has been done). And obviously it doesn’t have tasks that would be beneficial for FFmpeg/Libav but not for VLC directly.

Let’s take a look at some commercial user. All those video hosting sites are available mostly because there’s enough bandwidth to stream video and because there is free code to support most of the formats uploaded by users so they don’t have to bother about details. And that free code is… (no points for guessing). And the biggest video hosting site was mentioned above. They don’t provide any support nor even acknowledge that they use certain opensource projects. Of course one can point to Baidu Summer of Code program but it’s for involving students into opensource (and probably finding fresh meat for themselves) and it doesn’t work that good for projects—you mostly have students willing to get money and/or credit for résumé. They tend to do the task and disappear completely. I’m sure that projects would prefer to have financial freedom instead to sponsor specific tasks that could be undertaken by anyone (not just students) and last whatever time is needed (not just summer). If the project gets some support from large companies it’s usually because a developer from that project working in that company convinced the management to do so.

Even worse is the situation with distributions because they tend to demand free technical support from you: fixing your own bugs, fixing other projects that use your code and such. What do you get in return? Nothing. The recent Debian and Xscreensaver messup is not some special example, it’s too typical.

It’s funny how the best sponsor for Libav might be Lu_minem.it—a small Italian company whose founder is Libav developer and thus knows what the project needs.

How do I fit into all of this? I was FFmpeg and then Libav developer, later a multimedia related work has found me because of my work on decoders (I’d been trying to find a good job myself but failed and had to accept the best offer I got). I was around core developers of the projects and thus could learn the facts written above. What have I got beside developing experience, acquaintances and pessimistic outlook on life? BSoC 2006-2009 participation (where I managed to finish only one project in time really despite the formal passing of it) plus some smaller sum of money for rewriting a component of swscale to make it LGPL. And some free dinners from VideoLAN foundation. So I’m fine but seeing that the project cannot afford paying a developer for some internal project (like writing a new RealMedia demuxer) is still very sad.

On multimedia player names

April 17th, 2016

Warning: if you do not recognize names mentioned here you might be too young.

I’ve been using computer for about nineteen years and during that time I tried various players for various formats. My curiosity for internal format design made me search for information about compression methods, source code for decoders and such. So it led me to the current state (doing nothing). Yet for the many multimedia players I know there are some naming issues and that’s what I want to talk about.

My first versatile player was PLAYSND.COM by Yuri Tumarin. This 13kB DOS program could play a lot of various sound formats like WAV, VOC, MIDI variants and Adlib tracker music (RAD, HSC and lots of other variants). The best part is that it played some compressed WAV files too (various ADPCM variants and more). Excellent tool but the name is too bland and hard to search for.

Speaking of Adlib tracker formats, there’s an opensource player with Adlib emulator supporting lots and lots of them. The problem with it (beside being outdated now)? It’s called adplug. A good name to be blocked by a generic rule!

Let’s move to video players.

My first Linux player for VideoCDs was MpegTV. It was a commercial program but again, it was a country where nobody bothered about piracy and it was the only player on Linux I knew that could decode VideoCDs without stuttering. The player was doing its task fine but its name is rather cringeworthy.

Then I found out about DVD-oriented players like Ogle and Xine. Good names. And I still use Xine sometime when I need to play DVDs.

And the golden standard of multimedia on Unix systems—XAnim. The only bad thing is that the last time I checked it didn’t work correctly in 32-bit X11 mode. But it did its job well and I’m still grateful it exists (and also its binary plugins were good binary specification for missing codecs).

And there’s MPlayer. It was fast, it had many useful features and codecs supported (I still use it as a testbed for running some 32-bit VfW/DMO codecs when I cannot write a decoder without debug) but its codebase was horrible (in some cases legendarily horrible) and the name is both bland and reminds of mplayer32.exe (which crashed and hanged a lot too).

And one of its forks is named after one of the horrible chunks in libavcodec and its author operates under pseudonym. So was it really worth it to name the player MPV?

And I conclude this review with a well-known multimedia player that I won’t use. When I think about VLC the first meaning coming to my mind is variable length codes. And when the only good thing about your player name is the number of puns you can make I’d use something with more decent name thank you very much.

TwilightMotion Saga: The End

April 17th, 2016

I’ve finally documented what I know about VP4 in the wiki and I should unload it from my memory. Implementing decoders and such is left as an exercise for TrueMotion-loving reader.

Probably I’ll look at ClearVideo (for the N-th time) or some speech codec suite. Funny thing is that even if they market it as a single speech codec you have a good chance to find several codecs for different bitrates (like for Lernout & Hauspie you have CELP for 4.8 kbps and SBC with different parameters for 8, 12 and 16 kbps) and don’t get me started on VoxWare MetaSpeech (don’t confuse it with MetaSound—that one is not a speech codec or with MetaAudio—that one doesn’t exist), that’s the rant for another day.

TwilightMotion Saga: Random pre-VP3 Bits

April 16th, 2016

TrueMotion 1 was licensed and has several variants outside the usual TM1. There’s allegedly Horizons PowerEZ but only j-b would know anything about it—because it’s vintage and used to code content he’s interested in of course. The other version was used for intro and victory cutscenes in Star Control II: Ur-Quan Masters 3DO version, the source code is available so any Mike Melanson out there can have a look at it. To me it looked as the same coding algorithm but with custom delta tables and codebooks provided. Oh, and data is split between several files (global header, codebook, frame data and offsets to individual frames).

TrueMotion 2 Realtime seems to be really Truemotion 1.2 Realtime Edition. It has quite similar header format to TrueMotion 1 (same obfuscation even) but with some values that would make TM1 decoder bail out on error and it was released before actual TrueMotion 2.

TrueMotion 2X seem to return to coding method from TM1 as well since there’s a suspicious similarity between its inverse Huffman coding method (they call it “string encoder” which sounds somewhat even more confusing) and the codebook used in TM1 except that in TM2X they use 0x80 as the end of data flag instead of 0x01.

P.S. I should really move to VP4 and then away from this codec family altogether.

A Quick Look on IMM4

April 10th, 2016

So I’ve spent an hour or so to look at IMM4.

What do you know, it’s a very simple IDCT codec with interframes. Intraframes have only DCT with usual run-level VLC coding, interframes have skip flag to tell whether this macroblock should be skipped or there’s a difference to the previous frame coded or intra block. See, no motion vectors, quantisation is single value per block (except for DC in intra block), there seems to be no zigzagging either. You cannot get much simpler than that.

TwilightMotion Saga 2X

April 9th, 2016

Okay, now it should be the last post about TM2X.

It’s hard to believe but looks like there were at least five versions of this codec that can be distinguished by the chunk ID where frame information is stored (I have decoder for versions 1-5 and all known samples are version 4). So in version 5 they’ve added coding of motion vectors for 8×8 blocks in various forms including quadtree (and that’s what confused me). Looks like there are tile dimensions stored in configuration chunk (0xA0000109) and codec operates on those.

Again, looks like decoder first calls a function to determine what to do with a row of blocks and then corresponding functions decoding (sub)block data. And I was confused by those too—some of the functions read luma and chroma, some functions read only chroma and some read luma, chroma and two other unidentified values of different types (so it’s not a motion vector). They always have 2 luma samples (if present) and 1/2/4/8 chroma samples. Or is it the other way round with two chroma samples and 1-8 luma samples?

What the Duck, On2, couldn’t you opensource TR20 and TM2X/TM2A along with TM1, TM2 and TM VP3 (and they were all in the same package, mind you)?

In any case I’ll try to forget it again, there’s still VP4 (aka AOM codec -5).

How the codecs should emerge (hint: without .ebuilds)

April 6th, 2016

So it has come to this, some events and discussions made me write this post.

How I imagine the perfect process for new codecs? It’s rather simple model: you have some places where ideas and enthusiasts swarm and from their work and selecting best ideas new candidate codec is born.

There are such places for all codec types: audio enthusiasts can find testers at Hydrogenaudio, video enthusiasts can talk at Doom9, general and image compression people seem to be present at encode.ru. In first approximation it works as expected—people propose ideas, test new compression programs and report benchmarks, suggest improvements. What can be wrong there? Just one thing: people making software incompatible with anything else (custom containers/archive formats) and trying to push it on everybody. After you invent some format make sure it works in some standard environment (for compressors it’s usually single file compression mode, .tar.xz seems to be more popular than .7z even if they use the same LZMA algorithm; for codecs it should be the standard container—even Matroska would do). And document the format too—properly instead of usual “bug off” level.

There are standardised codecs that undergo similar process: various companies or researchers submit their work, a base for a new standard is chosen, new proposals try to improve it. And then companies start to push their patented shit there and that’s where the system goes wrong (QMF in MPEG Audio Layer III anyone?). It’s not better when some company tries to push its product as a standard without any evaluation (and thus we get wonderful line of SMPTE VC-x codecs for instance).

And there’s OggXiph. This is again a community that designs codecs mostly because they can and pushes them mostly because they’re Free™ and OpenSource™ and they mostly suck otherwise: Ogg format is for streaming not good for anything, most people still don’t know that it’s Ogg/FLAC because it was developed outside (and has horrible raw stream format), Speex has no readable specification and easier understood with disassembling the library rather than reading source code, Theora is an outdated enterprise grade code, Opus has its issues (but it’s rather good, one cannot deny that), Daala will probably never happen.

And what do I see in recent news? Alliance for Open Media plans to release first draft of their codec soon and it is:

  • hosted on baidusource.com;
  • for now just libvpx with some names changes;
  • everything else about it screams Baidu too.

It if looks like Duck, produces codecs like Duck and has the same source code as Duck, then it probably is DuckOn2Baidu.

At least in the old times there was some competition of ideas in codecs so one could choose between different codecs giving good results—and in some cases they were available for various ecosystems too (e.g. Indeo was present in AVI and MOV, ClearVideo managed to get into AVI, MOV and RM). Now it’s just foam of lossless codecs that even their authors forget about next year and one or two companies pushing their stuff on everybody. And that makes me sad.

TM2X Woes

April 3rd, 2016

I don’t know what I should write about this codec.

TM2X (or TM2A, they are really identical) differs in design from TM2 Vanilla. The main principle seems to stay the same for TM2, TM2X and TM2RT — they all operate on delta coding from the previous delta and top neighbour. But while for TM2 it’s always 4×4 blocks, for TM2RT it’s the whole plane, for TM2X it seems to be variable block size (i.e. it can be 8×8 block or even larger). TM2 uses classical Huffman coded data (with tree description and such) one per each block type, TM2RT uses fixed size deltas (2-, 3- or 4-bit), TM2X uses inverse Huffman lists (i.e. each byte codes a list of values which you’re supposed to read sequentially). And for TM2 there was source code (horrible C but source code nevertheless), TM2RT had compact and rather sane binary specification, TM2X has only an insane binary specification. How insane? For starters, it uses obfuscation for some chunks that’s tedious to undo by hand (unlike TM2RT), it has internal design relying on calling on array of virtual functions and those seem to treat esp as “Eh, Structure Pointer” which will confuse any decompiler.

Thanks to that I was unable to reconstruct all the decoding logic but at least some facts seem to be more or less clear:

  • decoding seems to vary greatly depending on decoder configuration provided in corresponding chunks (since those values are used to build function pointer arrays);
  • there’s lots and lots of block decoding functions that read different amount of deltas per 8 or 16 pixels, e.g. there can be 3 or 5 deltas per 8 pixels;
  • all decoding functions use the same inverse Huffman list but there are different ways to remap its output: there are delta value mapping tables for luma and chroma, generic value decoding uses special escape value to signal that its decoding is not done yet etc;
  • motion compensation is indeed uses halfpel precision.

So I’ll probably just forget about this codec and move to VP4 and then forget about all these turkeyduck codecs. I fear that ClearVideo will be abandoned on the similar level too. Well, at least there’s a lot of speech codecs to talk about.