Archive for the ‘TrueMotion’ Category

NihAV: the Last Quack

Thursday, October 31st, 2019

Finally NihAV got full-feature VP7 decoding support (well, except one very exotic case for a very exotic mode) so now I can move to other things like actually making various decoders bit-exact, fixing other bugs in them, adding missing pieces of code for player and even documenting stuff. I hope to give a presentation of my work on VDD 2020 or FOSDEM 2021 (whichever accepts it) and I want to have something decent to present by then.

Anyway, here’s a review of VP7.
(more…)

NihAV: still ducking

Saturday, July 27th, 2019

While it’s summer and I’d rather travel around (or suffer from heat when I can’t), there has been some progress on NihAV. Now I can decode VP5 and VP6 files. Reconstruction still sucks because it takes a lot of effort to make perfect reconstruction and I’m too lazy to do that when simple demonstration that the decoder works would suffice.

Anyway, now I can decode both VP5 and VP6 files including interlaced ones. Interlacing in VP5/6 is done in very simple way like many other codecs: there’s a bit for each macroblock telling whether macroblock should be output in interlaced form or not.

Of course this being VPx family, they had to do it with some creativity. First you decode base interlaced bit probability, which is stored as 8-bit value while all other bit probabilities are stored in 7 bits. Then you derive actual probability for interlaced bit and decode it before any other macroblock information (including macroblock type—it’s that important). Probability is derived by companding base probability depending on whether last macroblock was interlaced (then probability is halved) or not (then it’s remapped to fit 128-255 range)—except for the first macroblock in a row which would use the base probability without modifications. And for VP6 you also have to use different starting scan order (band assignment for each coefficient, now it’s shuffled). This is so trivial that one would wonder why this has not been done in libavcodec decoder yet.

There are three possible things to do next: polish current implementation, move to AVC (On2 AVC that is) or move to AVC (Duck VP7 which is AVC ripoff). But probably I’ll simply keep doing nothing instead.

VP3-VP6: the Golden (Frame) Age of Duck Codecs

Friday, May 24th, 2019

Dedicated to Peter Ross, who wrote an opensource VP4 decoder (that is not committed to CEmpeg yet at the time of the writing).

The codecs from VP3 to VP6 form a single codec family that is united not merely by the design but even by the header—every frame in this codec (sub)family has the same header format. And the leaked VP6 format specification still calls the version field there Vp3VersionNo (versions 0-2 used by VP3, 3 is used by VP4, 5 is for VP5 and 6-8 is for VP6). VP7 changed the both the coding principles to mimic H.264 and the header format too. And you can call it the golden age for Duck because it’s when it gained popularity with VP3 donated to open-source community (and xiphed to Theora which was the only patent-free(ish) opensource video codec with decent performance back then) to its greatest success found in VP6, employed both in games and in Flash video (remember when BaidUTube still used .flv with VP6 and N*llyMos*r ADPCM or Speex?). And today, having gathered enough material, I want to give an overview of these codecs. Oh, and NihAV can decode VP30 and VP31 now.
(more…)

NihAV: now with TM2X support!

Thursday, April 11th, 2019

I’m proud to say that NihAV got TrueMotion 2X support. For now only intra frames are supported but 75% of the samples I have (i.e. three samples) have just intra frames. At least I could check that it works as supposed.

First, here’s codec description after I managed to write a working decoder for it. TrueMotion 2X is another of those codecs that’s closer to TrueMotion 1 in design. It still uses the same variable-length codebook instead of Huffman coding (actually only version 5 of this codec uses bit reading for anything). It also uses “apply variable amount of deltas per block” approach but instead of old fixed scheme it now defines twenty-something coding approaches and tells decoder which ones to use in current frame. That is done because block size now can be variable too (but it’s always 8 in all files I’ve seen). And blocks are grouped in tiles (usually equivalent to one row of blocks but again, it may vary). The frame data obfuscation that XORs chunks inside the frame with a 32-bit key derived in a special way is not worth mentioning.

Second, the reference is quite peculiar too. It decodes frame data by filling an array of pointers to the functions that decode each line segment with proper mode, move to the next line and repeat. And those functions are in handwritten assembly—they use stack pointer register for decoder context pointer (that has original ESP saved somewhere inside), which also means they do not use stack space for anything and instead of returning they simply jump to the next routine until the final one restores the stack and returns properly. Thankfully Ghidra allows to assign context argument to ESP and while decompile still looks useless, assembly has proper references in the form mov EDX, dword ptr [ctx->luma_pred + ESP].

And finally, I could not check what binary specification really does because MPlayer could not run it. At first I tried running working combination of WMP+Win98 under OllyDbg in QEMU but it was painfully slow and even more painful to look at the memory state. In result I’ve managed to run TM2X decoder in MPlayer which then served as a good reference. The trick is that you should not try to run tm2X.dll (it’s really hopeless) but rather to take tm2Xdec.ax (or deceptively named tm20dec.ax from the same distribution that can handle TM2X unlike its earlier versions), patch one byte for check in DLL init and it works surprisingly well after that.

So what’s next? Probably I’ll just add missing features for the second TM2X sample (the other two samples are TM2A), maybe add Bink2 deblocking feature—since I’d rather have that decoder complete—and move to improving overall NihAV design. Frame management needs proper rework before I add more codecs—I want to change into a thread-safe version before I add more decoders. Plus I’ll need to add some missing bits for a player. There’s a lot of work still to do but I’m pleased that I still managed to do something.

NihAV: first quacks

Sunday, February 10th, 2019

As you can guess from the title NihAV got some support for Duck formats, namely TrueMotion 1 and TrueMotion RT. The implementation was rather straightforward except that it took some additional work to support 16-bit video buffers.

Of course I made sure my new TM1 decoder supports decoding sprites. Here’s an example of such sprite picture:

The hardest part was finding a sample.

I can’t sanely support transparency though since it uses 6-bit alpha with RGB555 image and while I can support such format quite easily I’d rather not.

If you wonder about the details of sprite support, it’s almost the same as ordinary inter-coded 16-bit TM1 with some nuances:

  1. frame header has additional 16-bit fields for sprite position and size (and actual sprite size is used in the decoding—the result is supposed to be put over the destination picture);
  2. sprite has twice as much mask bits as inter frame—two per 4×4 block (LSB first as usual). Bits 00 mean the next four pixels should be skipped (and predictor reset to zero), bits 01 mean it’s opaque sprite data and bits 10 mean it’s sprite data with transparency info present;
  3. sprite data is decoded as standard 4×4 TM1 block data (i.e. on C delta per 4×4 block) except that in transparency mode it also reads transparency data after each pixel pair.

That information comes from our old trusty source of information called VPVision source code dump (which was used to understand TrueMotion 1, 2 and probably DK3/DK4 ADPCM (and maybe VP3 but I’m not sure at all). Also it turns out to contain TrueMotion RT encoder source code as well (which could be used to reconstruct the decoder but I forgot about it at the time and used the binary specification instead).

And now I’d like to talk about Duck codecs in general.

The codecs from this family can be divided into three groups:

  1. The Age of Darkness: the original TrueMotion codec and its evolution plus related ADPCM codecs;
  2. The Age of Enlightenment: game codecs evolving into more generic video codecs and using more mainstream codec design (DCT-based, many ideas borrowed from H.263 and H.264) plus AVC (that’s audio codec if you don’t remember);
  3. The Age of EA Guardian: the codecs produced after Duck was bought by certain company.

The Age of Darkness codecs

Those codecs were used mostly in video games but TM1 was also licensed to Horizons Technology.

The idea behind TM1 is very simple: you split video into 4×4 blocks, predict each pixel from top and pack using quantised deltas and fixed codebook looking more like Tunstall codes (i.e. output code is always a fixed length of one byte but it may correspond to a variable length sequence of input codes). Also depending on quality frame blocks have different number of colour difference deltas per block (1, 2 or 4).

TrueMotion RT is an adaptation of TM1 for real-time video capturing (hence the name). In this case video is coded as planar YUV410 using fixed set of deltas with index taking 2, 3 or 4 bits. But the general coding idea (top and left prediction, delta quantisation and coding its index) remains the same. It uses the same frame header obfuscation so it’s probably an elder sibling of TrueMotion 2 (and its name is more like TrueMotion RT version 2.0 and not TrueMotion 2 RT but the details are unclear). There are different versions of the codec, for example Star Control II: The Ur-Quan Masters on 3DO used a special TM1 format split into several files: .hdr for global information (including quantised delta sets), .tbl with codebook definition, .duk with actual frame data and .frm with the frame offsets for .duk file. It’s a pity I can’t support it without very special handling.

TrueMotion 2 gets rid of single static codebook and packs appropriate data (deltas for different-resolution blocks, motion vector data, actual block types etc etc) in separate segments with their own Huffman codes. There are many improvements but the codec still operates on 4×4 blocks with horizontal and vertical prediction of each symbol.

There is not much known about TrueMotion 2X but so far it looks like maybe slightly improved TM2. Hopefully it will be clearer if I manage to implement a decoder for it.

And finally there were two simple ADPCM codecs accompanying video (usually TM2), there’s nothing much to say about those.

The Age of Enlightenment codecs

This was the age when Duck codecs became widely known and accepted, when various companies licensed them for their own needs and when it was really the golden age for them.

It all starts with TrueMotion VP3 that set the standard for the following codecs. It employed the a bit non-standard 8×8 DCT, referencing last intra frame as an alternative to referencing just the previous frame (later knows as golden frame), with various types of information grouped together instead of interleaving it all, and with coefficients coded as tokens (EOB, zero run, plus-minus one, plus-minus two, large coefficient token and such). The same approach would be used for subsequent codecs as well. Of course it briefly enjoyed the renaissance when Duck decided to put it into open-source and Xiph Theora was created on its base (and since there were no other free and open-source video codec alternatives it was destined to have popularity and success before something better comes).

TrueMotion VP4 was mostly the same but with different coding method for some data types. Maybe it was the first codec to move edge loop filtering from being performed on the frame to being performed on temporary block used in motion compensation but I’m not entirely sure.

TrueCast VP5 was the first in the series to employ their own version of static binary arithmetic coder mostly known as bool coder. That means that instead of updating bit probability after each decoding using that context as CABAC does, frame header encodes fixed probabilities (or just updates from the probabilities in the previous frame) and uses them for decoding.

VP6. Probably the most famous of them all since it was used in Flash videos. From technical point of view it’s just small improvement in some details over VP5. I suspect this was the first codec in the series that introduced selecting random frame as the next golden frame (previously it was just last intra frame, now any inter frame can signal that it should become golden).

VP7. This is the first installation in the series that was based on H.264 ideas like 4×4 transform and spatial prediction.

And of course there’s AVS, an audio codec inspired by AAC LC that accompanied some VP5-VP7 videos.

The Age of Guardian codecs

While the design direction has not changed much, the codecs themselves mostly belong to the niche provided by their current owner and hardly used anywhere else. For now we have VP8, VP9 and VP10 (aka AV1).


I hope there will be more to write about those after I write decoders for the rest of them and learn the shameful details of their design in the process.

TwilightMotion Saga: The End

Sunday, April 17th, 2016

I’ve finally documented what I know about VP4 in the wiki and I should unload it from my memory. Implementing decoders and such is left as an exercise for TrueMotion-loving reader.

Probably I’ll look at ClearVideo (for the N-th time) or some speech codec suite. Funny thing is that even if they market it as a single speech codec you have a good chance to find several codecs for different bitrates (like for Lernout & Hauspie you have CELP for 4.8 kbps and SBC with different parameters for 8, 12 and 16 kbps) and don’t get me started on VoxWare MetaSpeech (don’t confuse it with MetaSound—that one is not a speech codec or with MetaAudio—that one doesn’t exist), that’s the rant for another day.

TwilightMotion Saga: Random pre-VP3 Bits

Saturday, April 16th, 2016

TrueMotion 1 was licensed and has several variants outside the usual TM1. There’s allegedly Horizons PowerEZ but only j-b would know anything about it—because it’s vintage and used to code content he’s interested in of course. The other version was used for intro and victory cutscenes in Star Control II: Ur-Quan Masters 3DO version, the source code is available so any Mike Melanson out there can have a look at it. To me it looked as the same coding algorithm but with custom delta tables and codebooks provided. Oh, and data is split between several files (global header, codebook, frame data and offsets to individual frames).

TrueMotion 2 Realtime seems to be really Truemotion 1.2 Realtime Edition. It has quite similar header format to TrueMotion 1 (same obfuscation even) but with some values that would make TM1 decoder bail out on error and it was released before actual TrueMotion 2.

TrueMotion 2X seem to return to coding method from TM1 as well since there’s a suspicious similarity between its inverse Huffman coding method (they call it “string encoder” which sounds somewhat even more confusing) and the codebook used in TM1 except that in TM2X they use 0x80 as the end of data flag instead of 0x01.

P.S. I should really move to VP4 and then away from this codec family altogether.

TwilightMotion Saga 2X

Saturday, April 9th, 2016

Okay, now it should be the last post about TM2X.

It’s hard to believe but looks like there were at least five versions of this codec that can be distinguished by the chunk ID where frame information is stored (I have decoder for versions 1-5 and all known samples are version 4). So in version 5 they’ve added coding of motion vectors for 8×8 blocks in various forms including quadtree (and that’s what confused me). Looks like there are tile dimensions stored in configuration chunk (0xA0000109) and codec operates on those.

Again, looks like decoder first calls a function to determine what to do with a row of blocks and then corresponding functions decoding (sub)block data. And I was confused by those too—some of the functions read luma and chroma, some functions read only chroma and some read luma, chroma and two other unidentified values of different types (so it’s not a motion vector). They always have 2 luma samples (if present) and 1/2/4/8 chroma samples. Or is it the other way round with two chroma samples and 1-8 luma samples?

What the Duck, On2, couldn’t you opensource TR20 and TM2X/TM2A along with TM1, TM2 and TM VP3 (and they were all in the same package, mind you)?

In any case I’ll try to forget it again, there’s still VP4 (aka AOM codec -5).

TM2X Woes

Sunday, April 3rd, 2016

I don’t know what I should write about this codec.

TM2X (or TM2A, they are really identical) differs in design from TM2 Vanilla. The main principle seems to stay the same for TM2, TM2X and TM2RT — they all operate on delta coding from the previous delta and top neighbour. But while for TM2 it’s always 4×4 blocks, for TM2RT it’s the whole plane, for TM2X it seems to be variable block size (i.e. it can be 8×8 block or even larger). TM2 uses classical Huffman coded data (with tree description and such) one per each block type, TM2RT uses fixed size deltas (2-, 3- or 4-bit), TM2X uses inverse Huffman lists (i.e. each byte codes a list of values which you’re supposed to read sequentially). And for TM2 there was source code (horrible C but source code nevertheless), TM2RT had compact and rather sane binary specification, TM2X has only an insane binary specification. How insane? For starters, it uses obfuscation for some chunks that’s tedious to undo by hand (unlike TM2RT), it has internal design relying on calling on array of virtual functions and those seem to treat esp as “Eh, Structure Pointer” which will confuse any decompiler.

Thanks to that I was unable to reconstruct all the decoding logic but at least some facts seem to be more or less clear:

  • decoding seems to vary greatly depending on decoder configuration provided in corresponding chunks (since those values are used to build function pointer arrays);
  • there’s lots and lots of block decoding functions that read different amount of deltas per 8 or 16 pixels, e.g. there can be 3 or 5 deltas per 8 pixels;
  • all decoding functions use the same inverse Huffman list but there are different ways to remap its output: there are delta value mapping tables for luma and chroma, generic value decoding uses special escape value to signal that its decoding is not done yet etc;
  • motion compensation is indeed uses halfpel precision.

So I’ll probably just forget about this codec and move to VP4 and then forget about all these turkeyduck codecs. I fear that ClearVideo will be abandoned on the similar level too. Well, at least there’s a lot of speech codecs to talk about.

TrueMotion 2 RealTime

Wednesday, March 30th, 2016

I’ve been reminded that this variant of TrueMotion exists too. What do you know, it’s actually somewhat like TrueMotion 2 NoModifiers.

Essentially it’s just another fixed packing scheme like Creative YUV, Cirrus Logic CLJR or Aura. You have left prediction, deltas coded with nibbles, the usual stuff (at least blocks in TM2 were coded similarly). The only peculiar thing is that it codes data by planes with chroma planes being coded first.

I hope to add detailed description of this codec to Multimedia Wiki by the end of this week and then forget about it again.