TrueMotion « Kostya's Boring Codec World

Archive for the ‘TrueMotion’ Category

About upcoming AV2…

Friday, August 6th, 2021

So today I’ve seen an article titled AV2 Video Codec — Early Performance Evaluation of the Research which of course has drawn my attention.

Fun things are that it is a sponsored article and that it’s written by three engineers from ViCueSoft. This is strange, but so far it still looks more promising than the original AV1 feature review article with over 20 authors and too much marketing in it (my review of it is here; and to be fair it was followed by more serious paper with less authors but this one exists as well). Anyway, let’s see what is presented here.

I don’t care about the performance much so I just quote the phrase from the conclusion: “…rough approximation shows only 1.2x times encoding complexity increase and 1.4x time decoding”. I find the increase in decoding complexity being larger than the increase of encoding complexity a bit strange, normally you’d expect encoding difficulty rising faster because of the nature of the coding approach in modern codecs (normally an encoder needs to search for the best combination of encoding tools and their parameters and then apply the same steps as decoder does in order to have a coded frame in the same state as decoder would have it). Let’s look at the features then, it’s the most interesting part to me anyway.

distant weighted compound mode and dual interpolation filter are removed;
semi-decoupled partitioning is introduced—this feature allows splitting luma and chroma blocks and code their contents independently under certain level. The paper also says there’s Dual Tree feature in VVC that does the same;
quantiser step overhaul—instead of six tables in AV1 now you have just one simple formula for all quantiser step;
extending motion sample selection to work with compound blocks as well;
more partitioning modes to be more like HEVC;
multiple reference line selection for intra prediction—allows you to select not just neighbouring row/column for directional intra prediction. The same tool exists in VVC. And it also reminds me of X8 frames in WMV2/WMV9, that is the first case of intra prediction using more than one line known to me;
offset-based intra prediction refinement—adding some offset to the top/left intra predicted edge of the block to make it even smoother (the offset is calculated from the neighbouring blocks as well);
intra secondary transform—this tool tries to improve compression by applying a special secondary transform to the low-frequency coefficients. VVC has low-frequency non separable transform doing the same;
simplifications in intra mode signalling;
some improvements in motion prediction coding;
cross-component sample offset—another chroma-from-luma tool: for the whole CTU between deblocking and CDEF stages a DC offset is calculated from the luma values and applied to chroma values.

Essentially there are three kinds of improvements: simplification or generalisation of the existing feature (including complete removal of it—I approve either), picking the tool used by VVC/H.266 (that approach works but lacks originality) and an occasional improvement of an existing tool (too few and not too original). Of course nobody knows when AV2 will be declared finished and some things will surely have changed by then, but I don’t expect radical changes.

Once I said that I’ll review H.266 when AV2 is released but these guys has essentially done my work instead of me. Thanks!

Posted in TrueMotion, Useless Rants, Various Video Codecs | No Comments »

A Modest Proposal for AV2

Wednesday, September 16th, 2020

Occasionally I look at the experiments in AV1 repository that should be the base for AV2 (unless Baidu rolls out VP11 from its private repository to replace it entirely). A year ago they added intra modes predictor based on neural network and in August they added a neural network based loop filter experiment as well. So, to make AV2 both simpler to implement in hardware and improve its compression efficiency I propose to switch all possible coding tools to use misapplied statistics. This way it can also attract more people from the corresponding field to compensate the lack of video compression experts. Considering the amount of pixels (let alone the ways to encode them) in a modern video it is BigData™ indeed.

Anyway, here is what I propose specifically:

expand intra mode prediction neural networks to predict block subdivision mode and coding mode for each part (including transform selection);
replace plane intra prediction with a trained neural network to reconstruct block from neighbours;
switch motion vector prediction to use neural network for prediction from neighbouring blocks in current and reference frames (the schemes in modern video codecs become too convoluted anyway);
come to think about it, neural network can simply output some weights for mixing several references in one block;
maybe even make a leap and ditch all the transforms for reconstructing block from coefficients directly by the model as well.

In result we’ll have a rather simple codec with most blocks being neural networks doing specific tasks, an arithmetic coder to provide input values, some logic to connect those blocks together, and some leftover DSP routines but I’m not sure we’ll need them at this stage. This will also greatly simplify the encoder as well as it will be more of a producing fitting model weights instead of trying some limited encoding combinations. And it may also be the first true next generation video codec after H.261 paving the road to radically different video codecs.

From hardware implementation point of view this will be a win too, you just need some ROM and RAM for models plus a generic tensor accelerator (which become common these days) and no need to design those custom DSP blocks.

P.S. Of course it may initially be slow and work in a range of thousands FPS (frames per season) but I’m not going to use AV1 let alone AV2 so why should I care?

Posted in TrueMotion, Useless Rants, Various Video Codecs | 3 Comments »

Reviewing AV1 Features

Saturday, March 21st, 2020

Since we have this wonderful situation in Europe and I need to stay at home why not do something useless and comment on the features of AV1 especially since there’s a nice paper from (some of?) the original authors is here. In this post I’ll try to review it and give my comments on various details presented there.

First of all I’d like to note that the paper has 21 author for a review that can be done by a single person. I guess this was done to give academic credit to the people involved and I have no problems with that (also I should note that even if two of fourteen pages are short authors’ biographies they were probably the most interesting part of paper to me).
(more…)

Posted in TrueMotion, Useless Rants, Various Video Codecs | 7 Comments »

General overview of Duck codecs and their design

Saturday, February 15th, 2020

I’ve finally finished polishing out decoders for all Duck codecs (before it was bought by Baidu) and now they all seem to work fine (except AVC, that one can wait for later—much much later). And while I moved to even more hairier and painful tasks (reorganising nihav-core and even documenting it) now, as I have full understanding how those codecs work, I can give an overview of their design (not the bit-by-bit description of the format, we have The Wiki for that but rather most notable features and similarities to other codecs) and form my opinion on them.

TrueMotion 1

Somehow this might be their most original codec. While it’s simple codec with delta prediction I can’t remember any other codec that used a variable-length codebook with byte indices. Also this is the only codec in the family that works with RGB (16- and 24-bit modes even; the rest of codecs use YUV).

TrueMotion RT

This one is a trivial codec for real-time video capturing (hence the name) that codes deltas with fixed quantisation scheme (2, 3 or 4 bits deltas with predefined step sizes).

TrueMotion 2

This codec is still based on delta coding but now instead of working with individual pixels it works with 4×4 blocks that can have different amount of deltas and even employ motion compensation (instead of coding deltas). Also the data is separated into different streams and each of them is Huffman coded.

The approach with coding different kinds of information in separate chunks will be used in later codecs as well.

TrueMotion 2X

TrueMotion 2X is some weird amalgamation of TrueMotion 1 and TrueMotion 2. It works with 8×8 blocks that may have different amount of deltas like TM2 and information is grouped into chunks like TM2 but it uses variable codebook approach from TM1.

The main distinguishing features of this codec though are having multiple chunk variants for holding the same data and obfuscating data using XORing with 32-bit key derived from a key stored in a frame by passing it through LSFR a couple of times. IIRC frame data also contains the name of person owning the copy of the program so it might be some kind of protection scheme but it looks dubious at best.

3- and 4-bit ADPCM

As you can guess these codecs are based on DVI ADPCM (4-bit variant is essentially IMA ADPCM with different block header), 3-bit variant simply expands three deltas into four samples by interpolating coded differences (which has been done by other formats as well but I don’t remember which ones).

VP3-VP4

Starting with this format Duck moved to the codec design approach which I can describe as “make an equivalent of some existing codec but with some crazy thing replacing some less important stage”. It’s not like they are the only company doing this but it’s probably the only one leaving you with “how did they manage to come up with that idea?” question and VP3 is a very good example of that.

First of all, VP3 has an unusual block clustering: 8×8 blocks are grouped into 16×16 macroblocks and into 32×32 superblocks; blocks in superblocks are walked in Hilbert pattern but macroblocks in superblocks use zigzag pattern. Except that when you have four motion vectors in a macroblock they are stored also in zigzag pattern. Oh, and superblocks are walked in raster format plane after plane. Macroblock having data for both luma and chroma? Leave that to other codecs.

Then we have another feature familiar from TM2 times: data is grouped by type. First you have superblock information (intra/skip/inter), then macroblock information (which kind of motion it uses), then motion vectors and finally block coefficients.

Speaking of motion vectors, there are four features related to them that make these codecs different. First, motion vector prediction uses last/second last motion vector (in the order of decoding) as the base instead of median prediction in other codecs (this scheme will live up until VP9 with some modifications; I guess it’s done so because of the scan order but who knows). Second, motion interpolation is done as averaging two pixels—and for (½,½) case you average pixels on diagonal, which one of two depends on motion vector direction (averaging all four pixels? who would do that?!). Third, the introduction of golden frame as an alternative reference frame (don’t confuse it with altref frame introduced in VP8). This one is probably done to avoid B-frames that were patented at the time (at least that’s what people think). Fun fact: in VP31-VP5 golden frame is selected as last intra frame, in later codecs it can be selected with a special bit or even partially updated but in VP30 any frame with low enough quantiser automatically becomes new golden frame. And fourth, VP4 moved the loop filtering to motion compensation process so the reference picture does not have its edges filtered but when you perform motion compensation you apply it on source block edges using the current strength. This scheme remained until VP7 where they moved to the usual in-loop deblocking again (also it’s fun to encounter blocky intra frame image that gets smoothed with the following frames).

Now the block coefficients coding. VP3-VP9 used essentially the same scheme: decode special token that tells you what you have—a run of end-of-block flags, a run of zeroes, some small non-zero value or a larger value falling into certain range. Then you decode trailing bits if needed and expand token to form coefficient block. For some (error resiliency?) reasons VP3 had those tokens stored by coefficient number for all blocks (with some skips if zero run was coded) while VP4 had them grouped by block.

I should also mention DC prediction here. For obvious reasons it’s not median predicted either but rather calculated as weighted sum of neighbour block DCs in VP3 or “if you have two neighbour values available take their average, otherwise use the last predicted value” in VP4.

And final pet peeve is the DCT they used in VP3-VP6. While it’s good to have clearly defined integer DCT instead of a mess with different DCT implementations in H.263 / MPEG-4 ASP era, they decided to use transform coefficients in range 12785-64277 so essentially you have to multiply signed 16-bit input coefficient by unsigned 16-bit transform coefficient (and discard low 16 bits immediately). Now realize you have SIMD instruction for either signed*signed->take high or unsigned*unsigned->take high operations and not for this case. Sigh.

VP5

The main difference of VP5 from VP4 is the support for interlaced coding mode. And maybe also new binary range coder (named bool coder) that’s been in use even in VP9.

So now all non-binary data in the frame is coded using trees with fixed probabilities (i.e. you read bit with probability stored in the node and it’s zero take left branch, otherwise take right branch). Those probabilities might be constant or set to some new values at the beginning of the frame.

Frame data still contains macroblock information first and coefficient data last.

Motion vectors are predicted using nearest and second nearest (called simply near) motion vectors from already decoded macroblocks scanned in certain order. Also the information about found prediction candidates is used as one of the context variables used to select some probability set in decoding process.

DC prediction is a bit weird and it’s easier to describe it in the form “you have a special cache for top/left DC values and you use them for prediction” except that you have an additional special case for chroma in the first macroblock.

VP6

There are several things that got changed from VP5, mainly coefficient data location and coding method and motion compensation. Also now you can signal that you want this particular inter frame to become new golden frame. And you can enjoy new alpha mode which is coded essentially as a separate frame after the first one but with just one plane.

First, now there are two coefficient ordering modes: the old “MB info first, coefficients later” and the mode where macroblock information interleaves coefficient data.

Second, now you have Huffman coding for coefficient data. You take the original tree with probabilities, calculate weights for each leaf and construct new Huffman tree that might be completely different from the original. And then you decode data by reading macroblock information with bool coder from one place and variable-length codes for DCT tokens from another.

Third, motion interpolation now uses either a special set of bicubic filter coefficients or simple bilinear interpolation. Also there’s a special mode for switching between interpolation methods depending on source block variance (i.e. if it’s greater than certain threshold then use bicubic interpolation, otherwise use bilinear interpolation). I don’t think this feature has been used after VP6 though.

Also it’s worth noting that now VP6 can change block scan per frame (probably it improves compression a bit by eliminating or shortening some zero runs).

Another fun fact is that depending on container (AVI or FLV) VP6 picture might be coded upside-down or downside-up.

AVC

My favourite audio codec. Essentially it’s simplified AAC-LC rip-off (just bands and coefficients, no noise codebooks or pulses or TNS) except for the special frame mode where you can have half of the frame or the whole frame coded with special mode which is essentially some arbitrarily selected subbands that should be merged together in certain order to reconstruct audio. I have the idea how it all works but I don’t want to debug my decoder yet.

VP7

The codec is not like H.264 at all: H.264 has plane prediction mode and VP7 has TrueMotion prediction mode. There is one thing though introduced in VP7 and dropped in VP8 (and resurrected in some form in VP9) called features (there’s also special frame fading mode but hardly anybody cares about that). Features is an alternative mode that may be present for some macroblocks: different quantiser, different deblocking strength, a flag to signal this macroblock should be used to update golden frame and special block drawing mode (related to interlacing but not quite). There are up to four possible feature values where it makes sense (i.e. not for golden frame update flag).

Last feature (called pitch) defines how block coefficients should be put and how motion compensation should be performed. So you can put decoded coefficients in interlaced mode or even doubly interlaced mode (i.e. using every fourth line instead of every second). Motion compensation has these modes too and more: you can get 4×4 block from 16×1 line or from a slanted block (i.e. every next line starts one pixel earlier/later than the previous one).

Another characteristic of VP7 is being evolved rather than designed. There are several places in the codec where you can safely claim they simply have written code (maybe with some bugs) and relied on its behaviour instead of making the code follow some principle. Below are some examples.

Motion vector candidates search may get wrong macroblock coordinates. Here are the words of Peter Ross from his VP7 decoder:

The vp7 reference decoder uses a padding macroblock column (added to right edge of the frame) to guard against illegal macroblock offsets. The algorithm has bugs that permit offsets to straddle the padding column.

Inter DC prediction for DC superblock that says “if three previously decoded DCs were the same then you should use it for prediction” is fine but why should you keep the history from the last frame? I understand it might improve compression if you have the same value for the whole previous frame but it still looks a bit strange.

Spatial (intra) prediction also behaves counter-intuitively. In 4×4 prediction mode when top right block is not available the bottom of macroblock right above should be used instead. And when it’s the last block in row then top right prediction is the replicated pixel from the top macroblock as well. This is hard to explain from codec design perspective but easy from implementer’s point of view: you have top pixels line cached and you update it after you decode the block (so if the data is unavailable you use last decoded data here instead of replicating last available pixel like in H.264).

Conclusion and final thoughts

I hope I was able to demonstrate in this post that Duck codecs have an element of originality but quite often they go so far in originality that you start wondering why they were doing it like that. While some of it might be because of the patent workarounds some things are showing that in some cases they were fiddling with the code instead of trying proper ideas first and implementing codec after the idea (no, idea “let’s use codec X as the base” does not count).

Also while I’m not going to deal with VP8 and VP9 unless I really have to, I can say that the people behind Duck codecs developing AV1 is both good and terrible thing. Good because they know how to propose stuff that looks different but still works similarly to some conventional codec. Terrible because they still don’t know how to design a codec properly—not writing some ad hoc code that does something but rather gather ideas, evaluate them and only after that implementing it. I heard the story that shortly before releasing VP8 to the public Baidu actually showed it to some opensource multimedia people and asked for their opinions and input; somebody (from Xiph IIRC) found a design flaw but it was left unfixed because the encoder relied on it as well and they were reluctant to change it.

AV1.0 Errata 1 shows similar design problems partly for the same reasons and I don’t expect AV2 to be conceptually better. Especially after hearing rumours that Baidu is working on it already probably to force mostly complete work on AOM so the codec is ready by the same time as H.266 (or MPEG/VVC as they say it in Italy). And since most opensource multimedia people are working on AV1 nowadays, the chances of some competitor appearing are slim. So don’t ask questions, just consume AV1 and then get excited for AV2.

Posted in NihAV, TrueMotion | 3 Comments »

NihAV: the Last Quack

Thursday, October 31st, 2019

Finally NihAV got full-feature VP7 decoding support (well, except one very exotic case for a very exotic mode) so now I can move to other things like actually making various decoders bit-exact, fixing other bugs in them, adding missing pieces of code for player and even documenting stuff. I hope to give a presentation of my work on VDD 2020 or FOSDEM 2021 (whichever accepts it) and I want to have something decent to present by then.

Anyway, here’s a review of VP7.
(more…)

Posted in NihAV, TrueMotion | 4 Comments »

NihAV: still ducking

Saturday, July 27th, 2019

While it’s summer and I’d rather travel around (or suffer from heat when I can’t), there has been some progress on NihAV. Now I can decode VP5 and VP6 files. Reconstruction still sucks because it takes a lot of effort to make perfect reconstruction and I’m too lazy to do that when simple demonstration that the decoder works would suffice.

Anyway, now I can decode both VP5 and VP6 files including interlaced ones. Interlacing in VP5/6 is done in very simple way like many other codecs: there’s a bit for each macroblock telling whether macroblock should be output in interlaced form or not.

Of course this being VPx family, they had to do it with some creativity. First you decode base interlaced bit probability, which is stored as 8-bit value while all other bit probabilities are stored in 7 bits. Then you derive actual probability for interlaced bit and decode it before any other macroblock information (including macroblock type—it’s that important). Probability is derived by companding base probability depending on whether last macroblock was interlaced (then probability is halved) or not (then it’s remapped to fit 128-255 range)—except for the first macroblock in a row which would use the base probability without modifications. And for VP6 you also have to use different starting scan order (band assignment for each coefficient, now it’s shuffled). This is so trivial that one would wonder why this has not been done in libavcodec decoder yet.

There are three possible things to do next: polish current implementation, move to AVC (On2 AVC that is) or move to AVC (Duck VP7 which is AVC ripoff). But probably I’ll simply keep doing nothing instead.

Posted in NihAV, TrueMotion | 8 Comments »

VP3-VP6: the Golden (Frame) Age of Duck Codecs

Friday, May 24th, 2019

Dedicated to Peter Ross, who wrote an opensource VP4 decoder (that is not committed to CEmpeg yet at the time of the writing).

The codecs from VP3 to VP6 form a single codec family that is united not merely by the design but even by the header—every frame in this codec (sub)family has the same header format. And the leaked VP6 format specification still calls the version field there Vp3VersionNo (versions 0-2 used by VP3, 3 is used by VP4, 5 is for VP5 and 6-8 is for VP6). VP7 changed the both the coding principles to mimic H.264 and the header format too. And you can call it the golden age for Duck because it’s when it gained popularity with VP3 donated to open-source community (and xiphed to Theora which was the only patent-free(ish) opensource video codec with decent performance back then) to its greatest success found in VP6, employed both in games and in Flash video (remember when BaidUTube still used .flv with VP6 and N*llyMos*r ADPCM or Speex?). And today, having gathered enough material, I want to give an overview of these codecs. Oh, and NihAV can decode VP30 and VP31 now.
(more…)

Posted in TrueMotion | 1 Comment »

NihAV: now with TM2X support!

Thursday, April 11th, 2019

I’m proud to say that NihAV got TrueMotion 2X support. For now only intra frames are supported but 75% of the samples I have (i.e. three samples) have just intra frames. At least I could check that it works as supposed.

First, here’s codec description after I managed to write a working decoder for it. TrueMotion 2X is another of those codecs that’s closer to TrueMotion 1 in design. It still uses the same variable-length codebook instead of Huffman coding (actually only version 5 of this codec uses bit reading for anything). It also uses “apply variable amount of deltas per block” approach but instead of old fixed scheme it now defines twenty-something coding approaches and tells decoder which ones to use in current frame. That is done because block size now can be variable too (but it’s always 8 in all files I’ve seen). And blocks are grouped in tiles (usually equivalent to one row of blocks but again, it may vary). The frame data obfuscation that XORs chunks inside the frame with a 32-bit key derived in a special way is not worth mentioning.

Second, the reference is quite peculiar too. It decodes frame data by filling an array of pointers to the functions that decode each line segment with proper mode, move to the next line and repeat. And those functions are in handwritten assembly—they use stack pointer register for decoder context pointer (that has original ESP saved somewhere inside), which also means they do not use stack space for anything and instead of returning they simply jump to the next routine until the final one restores the stack and returns properly. Thankfully Ghidra allows to assign context argument to ESP and while decompile still looks useless, assembly has proper references in the form mov EDX, dword ptr [ctx->luma_pred + ESP].

And finally, I could not check what binary specification really does because MPlayer could not run it. At first I tried running working combination of WMP+Win98 under OllyDbg in QEMU but it was painfully slow and even more painful to look at the memory state. In result I’ve managed to run TM2X decoder in MPlayer which then served as a good reference. The trick is that you should not try to run tm2X.dll (it’s really hopeless) but rather to take tm2Xdec.ax (or deceptively named tm20dec.ax from the same distribution that can handle TM2X unlike its earlier versions), patch one byte for check in DLL init and it works surprisingly well after that.

So what’s next? Probably I’ll just add missing features for the second TM2X sample (the other two samples are TM2A), maybe add Bink2 deblocking feature—since I’d rather have that decoder complete—and move to improving overall NihAV design. Frame management needs proper rework before I add more codecs—I want to change into a thread-safe version before I add more decoders. Plus I’ll need to add some missing bits for a player. There’s a lot of work still to do but I’m pleased that I still managed to do something.

Posted in NihAV, TrueMotion | 3 Comments »

NihAV: first quacks

Sunday, February 10th, 2019

As you can guess from the title NihAV got some support for Duck formats, namely TrueMotion 1 and TrueMotion RT. The implementation was rather straightforward except that it took some additional work to support 16-bit video buffers.

Of course I made sure my new TM1 decoder supports decoding sprites. Here’s an example of such sprite picture:

The hardest part was finding a sample.

I can’t sanely support transparency though since it uses 6-bit alpha with RGB555 image and while I can support such format quite easily I’d rather not.

If you wonder about the details of sprite support, it’s almost the same as ordinary inter-coded 16-bit TM1 with some nuances:

frame header has additional 16-bit fields for sprite position and size (and actual sprite size is used in the decoding—the result is supposed to be put over the destination picture);
sprite has twice as much mask bits as inter frame—two per 4×4 block (LSB first as usual). Bits 00 mean the next four pixels should be skipped (and predictor reset to zero), bits 01 mean it’s opaque sprite data and bits 10 mean it’s sprite data with transparency info present;
sprite data is decoded as standard 4×4 TM1 block data (i.e. on C delta per 4×4 block) except that in transparency mode it also reads transparency data after each pixel pair.

That information comes from our old trusty source of information called VPVision source code dump (which was used to understand TrueMotion 1, 2 and probably DK3/DK4 ADPCM (and maybe VP3 but I’m not sure at all). Also it turns out to contain TrueMotion RT encoder source code as well (which could be used to reconstruct the decoder but I forgot about it at the time and used the binary specification instead).

And now I’d like to talk about Duck codecs in general.

The codecs from this family can be divided into three groups:

The Age of Darkness: the original TrueMotion codec and its evolution plus related ADPCM codecs;
The Age of Enlightenment: game codecs evolving into more generic video codecs and using more mainstream codec design (DCT-based, many ideas borrowed from H.263 and H.264) plus AVC (that’s audio codec if you don’t remember);
The Age of EA Guardian: the codecs produced after Duck was bought by certain company.

The Age of Darkness codecs

Those codecs were used mostly in video games but TM1 was also licensed to Horizons Technology.

The idea behind TM1 is very simple: you split video into 4×4 blocks, predict each pixel from top and pack using quantised deltas and fixed codebook looking more like Tunstall codes (i.e. output code is always a fixed length of one byte but it may correspond to a variable length sequence of input codes). Also depending on quality frame blocks have different number of colour difference deltas per block (1, 2 or 4).

TrueMotion RT is an adaptation of TM1 for real-time video capturing (hence the name). In this case video is coded as planar YUV410 using fixed set of deltas with index taking 2, 3 or 4 bits. But the general coding idea (top and left prediction, delta quantisation and coding its index) remains the same. It uses the same frame header obfuscation so it’s probably an elder sibling of TrueMotion 2 (and its name is more like TrueMotion RT version 2.0 and not TrueMotion 2 RT but the details are unclear). There are different versions of the codec, for example Star Control II: The Ur-Quan Masters on 3DO used a special TM1 format split into several files: .hdr for global information (including quantised delta sets), .tbl with codebook definition, .duk with actual frame data and .frm with the frame offsets for .duk file. It’s a pity I can’t support it without very special handling.

TrueMotion 2 gets rid of single static codebook and packs appropriate data (deltas for different-resolution blocks, motion vector data, actual block types etc etc) in separate segments with their own Huffman codes. There are many improvements but the codec still operates on 4×4 blocks with horizontal and vertical prediction of each symbol.

There is not much known about TrueMotion 2X but so far it looks like maybe slightly improved TM2. Hopefully it will be clearer if I manage to implement a decoder for it.

And finally there were two simple ADPCM codecs accompanying video (usually TM2), there’s nothing much to say about those.

The Age of Enlightenment codecs

This was the age when Duck codecs became widely known and accepted, when various companies licensed them for their own needs and when it was really the golden age for them.

It all starts with TrueMotion VP3 that set the standard for the following codecs. It employed the a bit non-standard 8×8 DCT, referencing last intra frame as an alternative to referencing just the previous frame (later knows as golden frame), with various types of information grouped together instead of interleaving it all, and with coefficients coded as tokens (EOB, zero run, plus-minus one, plus-minus two, large coefficient token and such). The same approach would be used for subsequent codecs as well. Of course it briefly enjoyed the renaissance when Duck decided to put it into open-source and Xiph Theora was created on its base (and since there were no other free and open-source video codec alternatives it was destined to have popularity and success before something better comes).

TrueMotion VP4 was mostly the same but with different coding method for some data types. Maybe it was the first codec to move edge loop filtering from being performed on the frame to being performed on temporary block used in motion compensation but I’m not entirely sure.

TrueCast VP5 was the first in the series to employ their own version of static binary arithmetic coder mostly known as bool coder. That means that instead of updating bit probability after each decoding using that context as CABAC does, frame header encodes fixed probabilities (or just updates from the probabilities in the previous frame) and uses them for decoding.

VP6. Probably the most famous of them all since it was used in Flash videos. From technical point of view it’s just small improvement in some details over VP5. I suspect this was the first codec in the series that introduced selecting random frame as the next golden frame (previously it was just last intra frame, now any inter frame can signal that it should become golden).

VP7. This is the first installation in the series that was based on H.264 ideas like 4×4 transform and spatial prediction.

And of course there’s AVS, an audio codec inspired by AAC LC that accompanied some VP5-VP7 videos.

The Age of Guardian codecs

While the design direction has not changed much, the codecs themselves mostly belong to the niche provided by their current owner and hardly used anywhere else. For now we have VP8, VP9 and VP10 (aka AV1).

I hope there will be more to write about those after I write decoders for the rest of them and learn the shameful details of their design in the process.

Posted in NihAV, TrueMotion | 4 Comments »

TwilightMotion Saga: The End

Sunday, April 17th, 2016

I’ve finally documented what I know about VP4 in the wiki and I should unload it from my memory. Implementing decoders and such is left as an exercise for TrueMotion-loving reader.

Probably I’ll look at ClearVideo (for the N-th time) or some speech codec suite. Funny thing is that even if they market it as a single speech codec you have a good chance to find several codecs for different bitrates (like for Lernout & Hauspie you have CELP for 4.8 kbps and SBC with different parameters for 8, 12 and 16 kbps) and don’t get me started on VoxWare MetaSpeech (don’t confuse it with MetaSound—that one is not a speech codec or with MetaAudio—that one doesn’t exist), that’s the rant for another day.

Posted in TrueMotion | Comments Closed

Kostya's Boring Codec World

Archive for the ‘TrueMotion’ Category

About upcoming AV2…

A Modest Proposal for AV2

Reviewing AV1 Features

General overview of Duck codecs and their design

TrueMotion 1

TrueMotion RT

TrueMotion 2

TrueMotion 2X

3- and 4-bit ADPCM

VP3-VP4

VP5

VP6

AVC

VP7

Conclusion and final thoughts

NihAV: the Last Quack

NihAV: still ducking

VP3-VP6: the Golden (Frame) Age of Duck Codecs

NihAV: now with TM2X support!

NihAV: first quacks

The Age of Darkness codecs

The Age of Enlightenment codecs

The Age of Guardian codecs

TwilightMotion Saga: The End

Pages

Archives

Categories