Another reason for NihAV

July 4th, 2020

So instead of doing something productive like adding missing functionality bits and writing documentation I wasted my time on adding some QuickTime decoders. And while wasting time on adding SVQ1, SVQ3, QDMC and QDM2 decoders it became apparent why NihAV is a good thing to exist.

Implementing two of them was not a very big deal but implementing SVQ3 and QDM2 decoders took more than a week each because there are only two specifications available for them and both are equally hard to comprehend: the first one is the official binary specification, the second one is source code in libavcodec which is derived from the former.

The problem arises when somebody wants to understand how it works and/or reimplement the code and both SVQ3 and QDM2 decoder demonstrate two different aspects of that problem.

SVQ3 decoder is based on some draft of H.264 (or ex-MPEG/AVC if you’re from Piedmont) with certain extensions mostly related to motion compensation. Documentation for it was scarce and because of optimisations and integration with common H.264 decoder bits it’s hard to understand some of the things. One of those is intra prediction with two modes having SVQ3-specific hacks hidden in libavcodec/h264pred.c (those are 16×16 plane prediction mode giving transposed result and 4×4 diagonal down prediction being simplified and not relating on pixels not immediately top/left from the block) and another one is block coefficients decoding function. It took me quite a while to realize that it actually decodes three different kinds of blocks: single 4×4 block with zigzag scan, 4×4 block divided into two parts with interlaced scan, and 2×2 block. I’ve documented most of that in The Wiki (before that nobody has touched that page for almost ten years; sometimes I feel like I’m the only person contributing there).

QDM2 is horrible in different way. It is slightly improved translation of the original binary specification with hardly any idea how it works (there are still names like local_int_8 in the code). Don’t get me wrong, back in 2003-2005 when reverse engineering was done the only tools you had were debugger, disassembler (you’re lucky if it’s not the one provided by debugger) and no decompilers at all (IIRC rec appeared much later and was of limited usefulness, especially on multi-megabyte QT monolith—and that’s assuming you’re not doing that on Mac with even less tools available). I did some of such work back then as well so I understand how hard it is and how you’re happy that it works somehow and you can ship it and forget about it.

Another thing is that now it’s clear that QDMC and QDM2 are predecessors of DT$ LBR (aka Express) and use the same principles (QDMC simply coded noise and tones, QDM2 is almost like LBR but without some features like LPC or multichannel audio and with different chunk structure), but back in the day there was no documentation on LBR (or LBR itself for that matter).

But the main problem is that nobody has tried to understand the code since. It became a so-called category killer i.e. its existence prevents others from doing something similar. At least until some idiot tried to do another implementation in NihAV.

And here we have the reason for NihAV to exist: it advances the understanding of codecs for me (and I document results in The Wiki) resulting in different implementations that are (hopefully) easier to understand and sometimes even fix long-standing bugs. I hope this shall convince you that sometimes it’s good to have reimplementation of the decoder even if an existing implementation is good enough (as far as I remember the only time a decoder was rewritten in FFmpeg was when a reverse-engineered Indeo 3 decoder that crashed on damaged content almost every time was replaced with a reverse-engineered Indeo 3 decoder where a guy had the idea how it works).

But back to QDM2: while my decoder is not finished yet and I probably won’t bother with inter-frames in it (I’ve never seen any samples with those), it still decodes sweeps much better. That’s mostly because of the various bugs I’ve uncovered (also while discovering that Ghidra effectively does not allow to edit about a megabyte large decoder context). Since I have no incentive to produce a patch and people who created the decoder are long gone from the project, here are some spotted bugs: wrong coarse quantiser band selection (resulting in noise generated in wrong frequency range), reading bits past the chunk end (because is some cases checks are missing), ignoring group 4 tones because of the wrong conditions, some initial variables are set in the wrong way too. Nevertheless it mostly works and it was very useful for mapping the functions in the binary specification (fun fact: QDM2 decoder is located in QuickTime.qts while QDMC is located in QuickTimeInternetExtras.qtx).

NihAV: Conceptually Done!

June 7th, 2020

I’m happy to announce that NihAV has finally taken more or less complete form. Sure there are some concepts I wanted to play with (like raw streams handling) but I had no need for them so far so it can wait until much much later. But all major features required to build a transcoder are there as well as working transcoder itself.

As I wrote in the previous post I wanted to play with vector quantisation so first I implemented image palettisation but since that was not enough I implemented two encoders using vector quantisation: 15-bit MS Video 1 and Cinepak. I have no doubts that Tomas Härdin has written a much better encoder but why should that stop me from NIHing? Of course such encoder is not very useful by itself (and it was useless to begin with) so I needed a muxer to represent encoder output in some form. And then simply fiddling with parameters and recompiling became boring so I finally introduced generic options and in order to use those options without recompiling the binary every time I had to write a transcoder as well. But that means that now I can use NihAV to recode media into something else even if it’s just two crappy video encoders, MS ADPCM and PCM encoder with the large variety of supported output containers (AVI and WAV!). I called it conceptually done because all the essential concepts are there, not because there’s nothing left to do.

Now about video encoders. I’ll describe the NihAV design and how it works on a separate page, for now I just mention that while decoders are working on “frame in-picture/audio out” principle, encoders accept single picture or audio buffer for encoding and then may output a series of encoded packets. Why such asymmetry in design? Because decoders are expected to produce single output for single input (and frame reordering is handled externally) while most encoders are expected to have at least a single audio frame or couple of pictures of lookahead to make decisions about coding of a current input. For modern video codecs it may be a decision what frame type to assign or where to start a new scene, for audio codecs like AAC you may need to change current frame type if the following frame type has transients and previous frame type didn’t have them.

Anyway, back to the technical details about the encoders. MS Video 1 operates on 4×4 blocks that can be coded as skipped, filled with single colour, filled with two colours in a pattern, or split into 2×2 sub-blocks each filled with its own two colours in a pattern. Sounds perfect for median cut. Cinepak is much more complex. It splits frame into several strips and each strip is also split into 4×4 blocks that may be coded as skipped, single 2×2 YUV codeword (2×2 Y block and single U and V values) scaled twice or four YUV codewords from different codebook. Essentially for a good encoding you need to determine how to partition frame into strips optimally, split blocks into single and four-vector ones and find optimal codebooks for them separately. Since I wanted to write a working encoder mostly to check whether vector quantisation is working, I simply have fixed amount of strips and add every block as a candidate for both coding schemes without a following refining steps.

Here are some numbers if you really care about those. Input is laser05.avi (320×240 Indeo2 file with 196 video frames from the standard samples place). Encoding with MS Video 1 encoder takes about 4 seconds . Encoding Cinepak with median cut takes six seconds. Encoding Cinepak with ELBG and randomly-generated codebooks takes 36 seconds and result looks bad (but recognizable). Encoding Cinepak with ELBG that takes codebooks produced with median cut as the initial ones takes 68 seconds but the quality is higher than merely median cut and the output file is slightly smaller too.


Now with all of this done I should probably fix the knowingly bad decoders (RV6 and Bink2), add whatever missing decoders and features I see fit and start documenting it all. I have strong doubts about VDD this year but maybe I’ll be able to present my stuff at FOSDEM 2021.

NihAV: Now with Palette Support

May 31st, 2020

While NihAV had support for paletted formats before, now it has more use cases covered. Previously I could only decode paletted format and convert picture into some other format. Now it can handle palette in standard containers like AVI and MOV and even palette change in AVI (it’s done via NASideData which is essentially the same thing I NIHed more than nine years ago). In addition to that it can convert image into paletted format as well and below I’d like to give a brief review of methods employed.
Read the rest of this entry »

Monty Python & Quest for Holy Grail documented

May 19th, 2020

I’ve finally had time and documented my findings so if somebody is interested in it he can have at least something to start with.

#chemicalexperiments — Bread

May 9th, 2020

It seems that as a programmer and especially during these days you have an obligation to bake bread (the same way if you belonged to MPlayer community you had to watch anime). So here’s me doing it:

It’s made after a traditional recipe from Norrland: barley flour, wheat flour, milk, yeast, cinnamon, a bit of salt and molasses. IMO it goes fine with some gravad lax or proper cheese.

P.S. And if you think I should have made a sour-dough bread—I can always order some from Sweden instead.

NihAV: Toying with VivoActive

May 5th, 2020

Before moving to improving parts of NihAV not related to decoding I decided to implement some small family of formats and I picked VivoActive since somebody complained some of it was unsupported.

This family consists of one custom container format and three codecs based on ITU standards. Container format is simple, intended just for one video and one audio stream with video frame most likely split into 128-byte chunks (probably for better streaming), the only interesting thing is that it stores header in text form which is too flexible compared to the rest of format.

First audio codecs is ITU G.723.1 and it was painful to implement it. As a proper speech codec it has a lot of proper speech codec math like “multiply 32-bit value A by 16-bit value B and shift result by 15 bits” which requires explicit casts in Rust. On the other hoof it has saturating_add() and friends which help in many other cases. There are places where functions take the same data as input and output while in other places the same functions have different input and output arrays. Plus I wanted to have a slightly better design structure so there are functions inherent to subframes, some functions belong to the decoder instance and some are used by both. And then I had to debug it. To give it a perspective, G.723.1 decoder takes 110 kB in source form and code part is 37 kB; for Siren the numbers are 45 kB and 15 kB respectively; Vivo video decoder is merely 19 kB because most of the decoding is done by base H.263 decoder in nihav-codec-support.

Siren (or more officially Polycom Siren 7) is a codec that served as a base for ITU G.722.1. Since RealAudio Cook is based on G.722.1 and I’ve written a decoder for it already, this one was quite easy to implement. Especially considering that some guy wrote an opensource decoder and encoder for it back in early 2000s. Also this might be the case when having 5*2^N FFT finally paid off since Siren frames are 320 samples long so I still can use my standard IMDCT implementation here (it outputs samples in reverse order but that’s no problem).

And finally Vivo Video. It’s yet another codec based on H.263 (but with slightly different headers) and notable mostly for how it represents codebooks. The codebooks are stored as a single set (but not in order e.g. codebook definition number two is used for codebook number fourteen), each codebook can represent codes up to eight bits long (for longer codes you have escape prefix which means that e.g. codes starting with 0000 10 have their tails defined in another codebook set). Another interesting feature is that the codes are stored as text strings with ones, zeroes, and spaces (yes, the decoder parses them to get the actual code). Additionally it has a weird decoding mode where you keep a state ID, there’s a special table to map it to the actual codebook number, and codebook tells you how to change state ID when you decoded a new code. This mode can be used to decode the whole stream or just macroblock coefficients.

As for the codec itself, there are two flavours of it: Vivo/1.0 (or Vivo/0.90) and Vivo/2.0. The first version is plain H.263 that does not use any special features, the second version has PB-frames (i.e. frames where B-frame macroblock data is stored together with P-frame macroblock data) and it employs AIC (advanced intra coding mode). It’s probably the only codec I’ve seen that actually has AIC in P-frames and not just in I-frames. Reconstruction of P-frames because of this AIC mode is not perfect but as with G.723.1 decoder it’s good enough to demonstrate that it works and I don’t want to waste more time on it.

All in all it was a meh-y experiment with mediocre results and I should move on.

Weird Audio in VMD

April 19th, 2020

Spoilers: as it turns out, a French company needs some Belgian technology in their life.

In last post I mentioned that NihAV can decode all of VMDs except maybe from the latest generation. Since I was provided with the samples I looked what changed there.
Read the rest of this entry »

Better VMD Support in NihAV

April 16th, 2020

As the certain doctor from ScummVMTrek reminded me, the VMD format (developed by Coktel Vision that was bought by Sierra) was used in their own games too (and even more so, there VMDs were used for many animations where simple sprite would suffice as well) and they kept making some educational games way into 2000s. So I decided to look at those as well.

The Last Dynasty (which I’ve never played and unlikely to ever play) features VMD that can be decoded mostly fine except that in 320×161 video you sometimes have sudden 640×322 frames near the end of video.

Some education game from Adi 4.0 generation. This one has some videos in 15-bit RGB format plus IMA ADPCM compressed audio track. But it can’t beat the weirdness of…

Urban Runner. Here we have a mix for 15-bit RGB VMD, 24-bit RGB VMD and VMDs with Indeo 3 video. And if you thought this was simple enough here’s another fun trick for you. Despite having different depths, all non-Indeo3 VMDs use the same compression methods, just in some cases the buffer should be interpreted as bytes, sometimes as 16-bit little-endian words and sometimes like triplets. So far so good. But they had a bright idea of sometimes storing the image dimensions in pixels and sometimes in bytes. In result I look at VMD header to see what flavour it has there to see if I need to scale frame dimensions by bytes per pixel before decoding or not. And this game also features some videos that have 312×136 resolution except that the last frame is 624×272 (I had to allow my Indeo 3 decoder to change dimensions to handle that particular case).

At least it could be done mostly by guesswork (except for audio, I had to look into ADI4.exe using Ghidra to find out that it has IMA ADPCM now) and all files can be decoded fine now.

If somebody can provide me with some samples and binary for their latest generation I’d look at it as well.

NihAV: Progress Report

April 13th, 2020

Since we all in Europe have to suffer from the lockdown until things get better I can’t travel for now and in result I have to spend free time in different ways (mind you, I still work despite it’s work from home so it’s not that much free time added). Nevertheless I’ve spent a significant part of that time working on NihAV and in result I have some progress to report.
Read the rest of this entry »

Reviewing AV1 Features

March 21st, 2020

Since we have this wonderful situation in Europe and I need to stay at home why not do something useless and comment on the features of AV1 especially since there’s a nice paper from (some of?) the original authors is here. In this post I’ll try to review it and give my comments on various details presented there.

First of all I’d like to note that the paper has 21 author for a review that can be done by a single person. I guess this was done to give academic credit to the people involved and I have no problems with that (also I should note that even if two of fourteen pages are short authors’ biographies they were probably the most interesting part of paper to me).
Read the rest of this entry »