Kostya's Boring Codec World

TM2X: some more technical details

March 20th, 2016

So, while I still have no idea how this codecs functions I can describe some technical details from it.

First, codec data consists of chunks with tag like 0xA00001xx and chunk size in the beginning. Some chunks are unique, some may repeat, some are alternative to each other (e.g. there are four different chunk IDs for Huffman tree description, two of them differ only in header before tree data).

Second, some smaller chunks (like 0x09 with 3-byte payload containing some decoding parameters) are obfuscated by XORing with the key derived from main chunk data. Annoying and not adding much protection really.

Third, unlike plain TrueMotion 2, TrueMotion 2X~~11R6~~ has 8×8 blocks (and not 4×4), only 3 block types (instead of 7) and single Huffman tree descriptor (instead of one per non-null block types plus one for block types itself). And it’s in a rather curious format too.

Typical TM2X (or TM2A) frame usually (i.e. for both known samples) consists of 0x06 chunk with compressed block data, some small chunks like 0x15, 0x09, two 0x02 chunks, about a dozen of 0x0B chunks and 0x0A chunk with Huffman code description.

Motion vector coding is represented in several variations: simple signed 8-bit values, MV vector of fixed bit size with bias (both are coded before MV data), some recursive MV coding for large frame areas and even the coding using Huffman coding.

And finally some notes about Huffman coding itself. I’ve not understood it properly yet but here are some notes:

Huffman code descriptor is actually a 2D table of 8×256 size (it’s stored in compact way in the corresponding chunk), i.e. every byte has a list of up to 8 elements corresponding to it;
decoding is performed by moving on the 8-element list unless an escape value is seen, then a byte is read from the input and new 8-element list is selected, and after decoding the current position is saved for later (e.g. first you read byte like 0x2A and it corresponds to a list 0, 1, 2, 0x83 — that means on subsequent decoding calls you should get 0, 1, 2, 3 and move to reading a new byte from input). Disclaimer: at least that’s how I understood it, it seems to be a reverse coding to me, i.e. assigning a variable amount of tokens to single byte of input instead of conventional assigning a variable amount of bits to the single token;
in some cases an additional value may be read using both the descriptor and some additional table (it’s added to the result in those cases).

Posted in TrueMotion | Comments Closed

TM2X: some details

March 19th, 2016

Funny how I started this blog more than 10 years ago mostly to talk about TrueMotion 2 and now it’s TrueMotion 2X time.

First of all, an existing binary specification (feel free to ask Baidu for some other materials for this codec, I’m pretty sure you’ll receive a lot) is weird and half of it is not well decompilable. It looks like the compiler did something inverse to inlining and split out some parts of code into separate functions without usual prologue and simply accesses variables somewhere deep on stack:

Read the rest of this entry »

Posted in TrueMotion | Comments Closed

The End?

March 19th, 2016

It’s time to say myself: you are no longer relevant. All remotely useful codecs have been reverse engineered already and most of them are obsolete. Everybody uses either H.26[45]+AAC or VP{3,8,9}+Opus and nothing else is required (even by VLC). And I grew tired too if it wasn’t obvious from my previous posts (I don’t even blame lu_zero anymore).

And thus my plans are to document Duck TrueMotion 2X and VP4, ClearVideo and hopefully VX.

In NihAV (yeah, like it’ll ever happen) I decided to implement just fringe codecs like those and no new modern codecs maybe except Bink2 and WMV3 (I know what should be done to support beta P-frames there but the libavcodec decoder is so unwieldy that my brain switches off trying to analyse how to do that change there; before you say anything it’s a part of the code written before I took over it and failed to make it into full-featured decoder during GSoC 2006).

Posted in Useless Rants | 5 Comments »

About one FOSDEM talk…

March 1st, 2016

So, this FOSDEM certain Vittorio gave a talk about reverse engineering codecs, all materials are here. Here are my thoughts about it.

First of all, I gave a talk about similar topic once at VDD 12, it was my first and last public talk. Of course it was a fail and there was no single question asked (and mind you, VDD attendants are usually know multimedia much deeper than ordinary FOSDEM visitor). So I think it takes a lot of courage to give such talk so Vittorio did a good job here.

Now to the remarks about the presentation itself.

He calls himself a pupil of Kostya (slide 2) — the name is not rare so I don’t know which Kostya he meant. Definitely not me as he’s yet to show any signs he has learned something from me.

Slide 6 mentions examples of rip-off codecs and gets it wrong too. Real had some licensed codecs, RealVideo 2 is a licensed Intel rip-off of H.263 (and surprisingly Real had even one original codec, lossless audio one). Oh, and the VP family starts with VP3, before that it was Duck TrueMotion 1/2.

Categories mentioned on the next slide are rather random subsets of all video codecs out there. I’ve REd several codecs that do not belong to either category (one of them was used in Hedgewars clones called Worms BTW).

TDSC part of the talk is remarkable for two things: 5-line tool (slide 16) is something I write from heart when I’m too lazy to find the previous version of it and it actually took less time to RE it than to talk about how it was done (the main issue there was to call JPEG decoder inside TDSC decoder).

As for Canopus HQX — I’ve written about it years ago. Beside the profile-specific tables there’s actual decoding to be done (you know, VLCs, DCT, that kind of stuff). But large tables in binary specification fend off reverse engineers quite often.

So, it was a good introductory talk but I haven’t missed anything by not attending it.

Posted in Useless Rants | 3 Comments »

NAScale — Internal Design

February 6th, 2016

In my previous post I’ve described NAScale ideas and here I’d like to give more detailed overview on the internal design.

Basically, it builds a processing pipeline (or chain, I insist on terminology being inconsistent) that takes input frame, does some magic on it and outputs the result.
Let’s start with looking at typical pipeline for processing packed formats:

And that should be a pipeline for the worst case since we don’t need input unpacker when it’s not packed (and same for the output), and processing stage might be not needed either (if we simply repack formats), and in some cases one stage will be enough.
My approach for pipeline construction is rather simple:

we have special modules (called kernels) that are used to construct pipeline stages;
all those modules are divided into three categories (input handling, output handling and intermediate filters)
all stages input and output (except for the source and destination handlers) should be planar and either 8- or 16-bit native endian (the less variations in input and output one has to handle the better).

Efficiency considerations tell us there will be special kernels for combining format conversion into one stage (like super-optimised rgb24tovyuy) but generic pipeline able to hold almost any format should have one universal input unpacker and one universal output packer.

Let’s review how pipeline building should work.
Unfortunately, my prototype can build only several pipelines for very specific cases but the principle stays the same.

Zeroeth stage: check if we are dealing with completely the same input and output formats and dimensions and then just apply memcopy (the kernel is obviously called murder).
Even if formats are the same we may need to scale the input or convert it to account for colourspace details etc etc.
Right from the start we need to check if we deal with a packed format and insert unpack stage or skip it and feed input directly to the processing stage.
On one hand there might be need for input stage converting planar 12-bit input into proper 16-bit input for processing stages, on the other hand it might be skipped in some cases (for efficiency reasons).

And then you should add a next stage but don’t forget about intermediate planar buffers—they should be allocated for the stage so that the following stage will know where to read its input from.
Now it’s probably a good time to mention that during pipeline construction we should keep track of current format and thus each stage construction should modify it to let the next stage know what input format it should expect (e.g. input was packed 10-bit BGR, unpack makes it into 16-bit planar RGB, scale changes nothing but dimensions and rgb2yuv converts it to 16-bit YUV and pack converts it into output format like YUYV).
Don’t forget that for clarity and simplicity all stages get pointers to the components in the order of the colourspace model components, i.e. it’s always R,G,B even if the input was packed BGR32 (the pointers will point to the start positions of the component in a packed input in that case), and that applies to both input and output.

Kernels are standalone modules that prepare contexts and processing functions but intermediate and scratch buffers are allocated by NAScale core during pipeline construction.
Often you don’t need scratch buffers but when your stage outputs only three components (i.e. RGB) and the next stage demands four (i.e. RGBA) you should allocate a scratch buffer for that stage to use as an input (again, for efficiency reasons we might want to pass through buffers from the previous stage that are not touched in the current one but I’m not sure how to implement that yet).
Developing all of this should be not that hard though and most time should be spent on optimising common cases instead.
And that’s all for now.

Posted in NihAV | 4 Comments »

Sweden, the Land of Germany Tomorrow

January 17th, 2016

Everybody knowing even a bit about me knows that I live in Germany and I’d rather live in Sweden. One of the reasons is that Sweden feels like future (i.e. improved) version of Germany, and here I finally explain why.

Railways. While railway system in Germany is mostly fine it still can have some improvements: train carriages should have an electric outlet per seat. Now it’s fulfilled only in ICE 1st class seats without neighbouring seats. Also hassle-free (or simply free) WiFi onboard. Also it would be nice to have tilting trains like X2000 because it’s all fine and good when you have dedicated high-speed line but quite often you don’t and it would benefit more to have higher speed there (quick example: route Mannheim–Saarbrücken takes 1:32 by Regional Express with its 6 stops or 1:19 by ICE/TGV with only one intermediate stop). And Sweden is more liberal in a sense that now you have line Stockholm–Göteborg serviced by both SJ and MTR expresses (compared to that France where they mark every railway station as belonging to SNCF feels like Ukraine).

Service. Sweden has supermarkets open every day (shorter opening hours on holidays of course) and they also contain local post office too (here post offices are usually selling stationery and some goods from local supermarket instead and share the space with Postbank instead). In Germany almost everything is closed on Sundays. Also it’s funny how in Sweden Lidl can be considered low-tier supermarket (Hemköp/Willy:s, ICA and Coop are much better), in Germany Lidl is considered middle-tier supermarket and in Czech Republic it’s considered one of the best. And there’s much better recycling: all plastic bottles are accepted by the same machine (here you bottles that should be recycled elsewhere and bottles that are not recycled at all), there’s printing on packaging where it should sort to (i.e. hard/soft plastic, paper, metal etc) and water quality is way higher too.

Religion. Sweden definitely doesn’t bother with Christianity that much (and that’s why I enjoy visiting Swedish churches). And as a nice touch even in the centre of Stockholm you have squares named after Odin and Thor and a street called “way to Valhalla”. In Germany they still cannot admit that Wednesday is Wotan’s (or Odin’s) day.

Government offices. Sweden has got it right—census matters are handled by the tax office. Germany is yet to realize that idea. One can point out that in the USA citizenship and taxpaying are the same thing (since you have to file a declaration in any case and pay taxes even if you live and work abroad) but the execution in this case is botched.

P.S. I blame lu_zero for not giving reasons to blame him in this post in the usual way.

Posted in Useless Rants | Comments Closed

Swedish Food Guide for Some Restricted Cases

January 16th, 2016

So it has come to this. Looks like I should not eat the best Sweden can offer—namely, meat (especially game) and seafood (especially herring). Well, what can I eat in Sweden then?

Not surprisingly there are still many good and tasty things for me. Let’s see…

Chicken. At least in Stockholm any decent supermarket offers warm grilled chicken—full, half or just legs. And each supermarket offers its own brand of chicken sausages to grill.

Dairy products. Beside my favourite cheese and filmjölk varieties there are still many nice and crazy things, like youghurt-quark or drinkable quark (or ordinary cottage cheese that goes well with blueberry-raspberry jam).

Bread and pastry. Sweden has much better bread variety than Germany—that means more different kinds of bread, especially hard bread (if you’re German feel free to stay offended anyway or go discuss beer with Czechs). I also like västerbottenostpaj—a pie with Västerbotten cheese. And semlor. And various Swedish cakes. I should note that Swedes like to use cinnamon and whipped cream in various dishes and rare Swedish cake has no whipped cream. When I was staying at an hotel in Örnsköldsvik (or simply Ö-vik for locals) it has a large bowl of whipped cream among other things for breakfast. And I guess if I simply sprinkle some cinnamon onto a lump of whipped cream it will make Swedish Minimal Cake (I’d like it for sure). And of course marzipan pigs in season.

Crisps(chips). You have it under both names and they are extremely tasty. And when Swedes get bored with potatoes they can make any root into chips, including the ones I’ve never eaten in other form. Here in Germany I bought a packet of chips only once and it was Svenska LantChips of course.

Snacks in general. They usually occupy half of my bag when I return from Sweden (the rest is mostly drinks and other stuff). Wonderful nut mixes, dried fruit and berries, various confectionery things. And of course all supermarkets (beside lidl-iest of them) have naturgodis section where one can mix various sorts of these (think of loose candy stand but with dried strawberries, roasted almonds, peanuts in chocolate and such). And loose candy stands with at least dozen varieties are really everywhere.

Fruits and berries. Paradoxically I can buy vendetta oranges in Sweden but I’ve not seen them here in Germany—Sweden must be closer to Italy then (BTW I still blame lu_zero for not recognizing short name for Sicilian blood oranges despite pretending to know a lot about citruses). There’s wider selection of fruit than in Germany. And there’s much much better selection of berries. You have traditional raspberries, blueberries, black and red currants, RIMs, strawberries and also Nordic berries—Swedish blueberries (they are smaller and have more intense taste), lingon, cranberries and cloudberries. In Norrland in season you can get even more. On my arrival in December I bought some cherries in supermarket—they were available both fresh and frozen (and it’s hard to buy even frozen cherries in Germany outside season). And I like cherries, especially sour ones (BTW I blame lu_zero for being a competition).

Drinks. Sweden has the best water in the world and one can enjoy drinking tap water there (unlike tolerating tap water in Germany and doing it at your own risk in Ukraine). And that’s why a lot of drinks are sold as concentrates that you have to dilute yourself in proportion 1:3 or 1:6. And because it’s Sweden you have good selection of drinks based on berry juice. I especially like blueberry-cranberry and blueberry-raspberry drinks and of course lingon. Carbonated soft drinks are the best in the world too (and if word Trocadero doesn’t tell you anything you’re reading a wrong blog) but I wrote about it many times.

And of course julskinka. I’m Ukrainian after all.

Posted in Useless Rants | Comments Closed

Again on Danish “Julmust”

January 16th, 2016

Last year I’ve said some harsh words about some Danish påskmust. Just in case I was wrong I decided to recheck it and bought another bottle. Here are the pictures:

In case you don’t see the list of ingredients it contains:

carbonated water;
elderberry juice;
blackcurrant juice;
sugar;
grape juice;
malt extract;
spices;
caramel colour;
hops;
citric acid.

And it tastes like POSIX julmust from AIX brewery would taste—it has right ingredients but also lots of things normally not found in julmust and the taste is weird in result. Only people with perverted taste like Danes (remember, one of them invented C++ and is not ashamed of it) or lu_zero(whom I blame for his tastes quite often) would like it.

Posted in Useless Rants | 1 Comment »

NihAV — Notes on Audio

December 3rd, 2015

There’s one thing I like in Libav audio design and that’s planar audio. There’s one thing I don’t like about it and that’s handling multichannel audio. As a project with roots intermingled with MPlayer, it has followed M$ definition of multichannel audio and since many multichannel codecs have their own channel order lavc decoders have to define channel reordering tables. And it gets hairier when you have dynamic channel configuration (i.e. not one of the static channels configurations but rather a channel mask defining what channels are present or not). It gets especially fun with codecs that have extensions to the base 5.1 format that redefine existing channels and add new ones (like D*lby and DT$ codecs).

My proposal is that multichannel codecs should not bother with what the library considers The Only True Layout but rather export channel map along with the channel data and let the conversion library figure it out (shuffling pointers is not hard you know).

P.S. It’s hard to blame lu_zero for the current Libav situation but I still shall do it.

Posted in NihAV | 1 Comment »

NihAV — NAScale

November 21st, 2015

First, some history. If you don’t like reading about it just skip to the ruler below.

So, NAScale is born after yet another attempt to design a good colourspace conversion and scaling library. Long time ago FFmpeg didn’t have any proper solution for that and used rather rudimentary imgconvert; later it was replaced with libswscale lifted from MPlayer. Unfortunately it was designed for rather specific player tasks (mostly converting YUV to RGB for displaying with X11 DGA driver) rather than generic utility library and some of its original design still shows to this day. Actually, libswscale should have a warm place in every true FFmpeg developer’s heart next to MPEGEncContext. Still, while being far from ideal it has SIMD optimisations and it works, so it’s still being used.

And yet some people unsatisfied with it decided to write a replacement from scratch. Originally AVScale (a Libav™ project) was supposed to be designed during coding sprint in Summer 2014. And of course nothing substantial came out of it.

Then I proposed my vision how it should work and even wrote a proof of concept (i.e. throwaway) code to demonstrate it back in Autumn 2014. I’d made an update to it in March 2015 to show how to work with high bitdepth formats but nobody has touched it since then (and hardly before that too). Thus I’m reusing that failing effort as NAScale for NihAV.

And now about the NAScale design.

The main guiding rule was: “see libswscale? Don’t do it like this.”

First, I really hate long enums and dislike API/ABI breaks. So NAScale should have stable interface and no enumeration of known pixel formats. What should it have instead? Pixel format description that should be good enough to make NAScale convert even formats it had no idea about (like BARG5156 to YUV412).

So what should such description have? Colourspace information (RGB/YUV/XYZ/whatever, gamma, transfer function etc), size of whole packed pixel where applicable (i.e. not for planar formats) and individual component information. That component information includes information on how to find and/or extract such component (i.e. on which plane it is located, what shift and mask is needed to extract it from packed bitfield, how many bytes to skip to find the first and next component etc.) and subsampling information. The names chosen for those descriptors were Formaton and Chromaton (for rather obvious reasons).

Second, the way NAScale processes data. As I remember it libswscale converted input into YUV with fixed precision before scaling and then back into destination format unless it was common case format conversion without scaling (and then some bypass hacks were employed like plane repacking function and such).

NAScale prefers to build filter chain in stages. Each stage has either one function processing all components or a function processing only one component applied to each component — that allows you to execute e.g. scaling in parallel. It also allows to build proper conversion+scaling process without horrible hacks. Each stage might have its own temporary buffers that will be used for output (and fed to the next stage).

You need to convert XYZ to YUV? First you unpack XYZ into planar RGB (stage 1), then scale it (stage 2) and then convert it to YUV (stage 3). NAScale constructs chain by searching for kernels that can do the work (e.g. convert input into some intermediate format or pack planes into output format), provides that kernel with a Formaton and dimensions and that kernels sets stage processing functions. For example, the first stage of RGB to YUV is unpacking RGB data, thus NAScale searches for the kernel called rgbunp, which sets stage processing function and allocated RGB plane buffers, then the kernel called rgb2yuv will convert and pack RGB data from the planes into YUV.

And last, implementation. I’ve written some sample code that would be able to take RGB input (high bitdepth was supported too), scale it if needed and pack back into RGB or convert into YUV depending on what was requested. So test program converted raw r210 frame into r10k or input PPM into PPM or PGMYUV with scaling. I think it’s enough to demonstrate how the concept works. Sadly nobody has picked this work (and you know whom I blame for that; oh, and koda — he wanted to be mentioned too).

Posted in NihAV | 8 Comments »

Kostya's Boring Codec World

TM2X: some more technical details

TM2X: some details

The End?

About one FOSDEM talk…

NAScale — Internal Design

Sweden, the Land of Germany Tomorrow

Swedish Food Guide for Some Restricted Cases

Again on Danish “Julmust”

NihAV — Notes on Audio

NihAV — NAScale

Pages

Archives

Categories

Another Fine Blogs

Multimedia Projects

My E-mail

Meta