MidiVid codec family

September 26th, 2019

VP7 is such a nice codec that I decided to distract myself a little with something else. And that something else turned out to be MidiVid codec family. It turned out to be quite peculiar and somehow reminiscent of Duck codecs.

The family consists of three codecs:

  1. MidiVid — the original codec based on LZSS and vector quantisation;
  2. MidiVid Lossless — exactly what is says on a tin, based on LZSS and bunch of other technologies;
  3. MidiVid 3 — a codec based on simplified integer DCT and single codebook for all values.

I’ve actually added MidiVid decoder to NihAV because it’s simple (two hundred lines including boilerplate and tests) and way more fun than working on VP7 decoder. Now I’ll describe them and hopefully you’ll understand why it reminds me of Duck codecs despite not being similar in design.

MidiVid

This is a simple hold-and-modify video codec that had been used in some games back in PS2/Xbox era. The frame data can be stored either unpacked or packed with LZSS and it contains the following kinds of data: change mask for 8×8 blocks (in case of interframe—if it’s zero then leave block as is, otherwise decode new data for it), 4×4 block codebook data (up to 512 entries), high bits for 9-bit indices (if we have 257-512 various blocks) and 8-bit indexes for codebook.

The interesting part is that LZSS scheme looked very familiar and indeed it looks almost exactly like lzss.c from LZARI author (remember that? I still do), the only differences is that it does not use pre-filled window and flags are grouped into 16-bit word instead of single byte.

MidiVid Lossless

This one is a special best as it combines two completely different compression methods: the same LZSS as before and something used by BWT-based compressor (to the point that frame header contains FTWB or ZTWB IDs).

I’m positively convinced it was copied from some BTW-based compressor not just because of those IDs but also because it seems to employ the same methods as some old BTW-based compressor except for the Burrows–Wheeler transform itself (that would be too much for the old codecs): various data preprocessing methods (signalled by flags in the frame header), move-to-front coding (in its classical 1-2 coding form that does not update first two positions that much) plus coding coefficients in two groups: first just zero/one/large using order-3 adaptive model and then values larger than one using single order-1 adaptive model. What made it suspicious? Preprocessing methods.

MVLZ has different kinds of preprocessing methods: something looking like distance coding, static n-gram replacement, table prediction (i.e. when data is treated as series of n-bit numbers and the actual numbers are replaced with the difference between previous ones) and x86 call preprocessing (i.e. that trick when you change function call address from relative into absolute for better compression ratio and then undo it during decompression; known also as E8-preprocessing because x86 call opcode is E8 <32-bit offset> and it’s easy to just replace them instead of adding full disassembler to the archiver). I had my suspicions as n-gram replacement (that one is quite stupid for video codecs and it only replaces some values with some binary values that look more related to machine code than video) but the last item was a dead give-away. I’m pretty sure that somebody who knows open-source BWT compressors of late 1990s will probably recognize it even from this description but sadly I’ve not been following it that closely being more attracted to multimedia.

MidiVid 3

This codec is based on some static codebook for packing all values: block types, motion vectors and actual coefficients. Each block in macroblock can be coded with one of four modes: empty (fill with 0x80 in case of intra), DC only, just few coefficients DCT, and full DCT. As usual various kinds of data are grouped and coded as single array.

Motion compensation is full-pixel and unlike its predecessor it operates in YUV420 format.


This was an interesting detour but I have to return back to failing to start writing VP7 decoder.

P.S. I’ll try to document them with more details in the wiki soon.
P.P.S. This should’ve been a post about railways instead but I guess it will have to wait.

AVC support in NihAV: semi-done

September 14th, 2019

I’ve wasted enough time on AVC decoder for On2 family so while it’s not working properly for those special cases I’m moving to VP7 regardless.

For those who don’t know (or forgot; or never had a reason to care) On2 AAC is AAC-LC rip-off with some creative reconstruction modes added to the usual long/short windows. I’ve failed to understand how it works before and I fail to understand how it works still. But at least some details are a bit clearer now that I’ve analysed the whole codec from scratch with less guesswork.

The codec has three IDs that it recognizes: 0x500, 0x501 and 0x1234. First two are different only in the aspect that one handles singular packets and another one handles several packets glued together prefixed with size. The last ID is simply recognized but it does not have any special handling.

The tricky part is some special modes that do some heavy processing of data. For most modes you invoke IMDCT and that’s all, here you do some QMF-like filtering (probably for transients extraction), then you perform RDFT (previously I thought it was plain FFT but after long investigation it turned out to be RDFT after all) on quarters, merge those quarters using filters that look like convolution filters for four sub-bands, perform RDFT again on the whole block and add some transients. And after that you still may need to reverse the data before using permuted window for overlap-add operation. In other words it’s not fun and I lack education for recognizing all those algorithms used, why they’re used and where it goes wrong.

So hopefully I’ll return to it some day to fix it for good but now VP7 awaits (so I can at least formally declare Duck codecs family done and move to implementing missing bits in the framework itself).

Dingo Pictures: A Book

August 24th, 2019

When I think I’ve learned enough about Dingo Pictures there’s something new appears. From Toys review I’ve learned that Roswitha Haas also published a book (or maybe several) that are directly related to the subject at hoof. And of course being curious I’ve decided to buy it.

Amazon mentions two other books—an adaptation of Land Before Time and Jimmy Button but I dared not to buy them (one has definitely nothing to do with Dingo Pictures and I fear to find out that the first one is not about Tio).

Anyway, let’s look at the alternative version of King of the Animals Part 2 (spoiler: it’s still much better than Disney remake).


Read the rest of this entry »

NihAV: still ducking

July 27th, 2019

While it’s summer and I’d rather travel around (or suffer from heat when I can’t), there has been some progress on NihAV. Now I can decode VP5 and VP6 files. Reconstruction still sucks because it takes a lot of effort to make perfect reconstruction and I’m too lazy to do that when simple demonstration that the decoder works would suffice.

Anyway, now I can decode both VP5 and VP6 files including interlaced ones. Interlacing in VP5/6 is done in very simple way like many other codecs: there’s a bit for each macroblock telling whether macroblock should be output in interlaced form or not.

Of course this being VPx family, they had to do it with some creativity. First you decode base interlaced bit probability, which is stored as 8-bit value while all other bit probabilities are stored in 7 bits. Then you derive actual probability for interlaced bit and decode it before any other macroblock information (including macroblock type—it’s that important). Probability is derived by companding base probability depending on whether last macroblock was interlaced (then probability is halved) or not (then it’s remapped to fit 128-255 range)—except for the first macroblock in a row which would use the base probability without modifications. And for VP6 you also have to use different starting scan order (band assignment for each coefficient, now it’s shuffled). This is so trivial that one would wonder why this has not been done in libavcodec decoder yet.

There are three possible things to do next: polish current implementation, move to AVC (On2 AVC that is) or move to AVC (Duck VP7 which is AVC ripoff). But probably I’ll simply keep doing nothing instead.

Auf der Pälzische Eisenbahne…

June 16th, 2019

… gibt’s gar viele Haltstatione,
Kowelenz, Bingen, Bad Kreuznach, Lautre und Waldfischbach.

Ehm, sorry, wrong federal land for that song.

I think I can call my hobby project of travelling on all rail lines in Rheinland-Pfalz that are still in service complete.
Read the rest of this entry »

Bink-b: Encoder

May 31st, 2019

Recently I’ve been contacted by some guy working on a mod campaign for Heroes of Might and Magic III. The question was about the encoder for videos there. And since the original one is not likely to exist, I just wrote a simple one that would take PGMYUV image sequence and encode it. Here’s the gzipped source.

It took a couple of evenings to do that mostly because I still have weak symptoms of creeping perfectionism (thankfully it’s treated with my laziness). BIKb does not have Huffman-coded bundles, so the simplest straightforward encoding would be: write block type bundle (13-bit size and 4-bit elements), write empty other bundles, write several bundles containing pixels and you’re done. There’s a proper approach: write a full-featured encoder that takes input in several formats and that encodes using all possible features selecting the best quality for the target bitrate. There’s a hacky approach—translate later versions of Bink into BIKb (and then you remember that it has different motion compensation scheme so this approach won’t work). I’ve chosen something simple yet with some effectiveness: write an encoder that employs only vector quantisation and motion compensation for non-overlapped blocks plus add a quality setting so users can play with output size/quality if they really need it.

So how does the block encoding work? Block truncation coding, the fast and good way to quantise block into two colours (many video codecs back in the day used it and only some dared to use vector quantisation for more than two different values per block). Essentially you just calculate average pixel value and select two values depending on how many pixels in the block are larger than average and by how much they deviate. And here’s where quality parameter comes into play: depending on it encoder sets the threshold above which block is coded as is (aka full mode) instead as two colours and pattern in which they occur (of course if it’s a solid-colour fill it’s always coded as such). As I said, it’s simple but quite effective. Motion compensation is currently lossless i.e. encoder will try to find only the block that matches exactly (again, it can be improved but that would only lead to longer implementation times and even longer debugging times). This makes me appreciate the work on Smacker and Bink 1 video codecs and encoders for them even more.

Overall, it was a nice diversion from implementing Duck decoders for NihAV but I should probably return to it. The sooner it’s done the sooner I can move to something more exciting like finally experimenting with vector quantisation, or trying to write a player, or something else entirely. I avoid making plans but there are many possibilities at hoof so I just need to pick one.

VP3-VP6: the Golden (Frame) Age of Duck Codecs

May 24th, 2019

Dedicated to Peter Ross, who wrote an opensource VP4 decoder (that is not committed to CEmpeg yet at the time of the writing).

The codecs from VP3 to VP6 form a single codec family that is united not merely by the design but even by the header—every frame in this codec (sub)family has the same header format. And the leaked VP6 format specification still calls the version field there Vp3VersionNo (versions 0-2 used by VP3, 3 is used by VP4, 5 is for VP5 and 6-8 is for VP6). VP7 changed the both the coding principles to mimic H.264 and the header format too. And you can call it the golden age for Duck because it’s when it gained popularity with VP3 donated to open-source community (and xiphed to Theora which was the only patent-free(ish) opensource video codec with decent performance back then) to its greatest success found in VP6, employed both in games and in Flash video (remember when BaidUTube still used .flv with VP6 and N*llyMos*r ADPCM or Speex?). And today, having gathered enough material, I want to give an overview of these codecs. Oh, and NihAV can decode VP30 and VP31 now.
Read the rest of this entry »

NihAV: rust-clippy experience

May 18th, 2019

As I’ve mentioned in the previous post, I’ve finally tried rust-clippy to see what issues and suggestions it will have on my code. The results are not disappointing if you take the tool name seriously.
Read the rest of this entry »

NihAV: after clean-up

May 17th, 2019

Since the clean-up work on NihAV is done and I progress with Truemotion VP3 decoder, it’s a good time to talk about what I’ve actually done—there’s even more material to write waiting in the queue.

The intent was to make all frame-related stuff thread-safe and improve efficiency a bit. In order to do the former I had to replace most of the references from Rc<RefCell<T>> to Arc<T> and while doing it I introduced aliases like type NAFrameRef = Arc<NAFrame> and .into_ref() methods to convert object into ref-counted version. This helped when I tried switching from one implementation of reference counter to another and will make it easy to switch again if I ever need that (hopefully not). Now about improved efficiency and how it’s related to the ref-counting.

There’s a straightforward way of dealing with frames: you allocate the picture, fill it, dispose, allocate a new one, etc etc. And there’s a more effective way: you allocate several pictures at once, select an unused one, fill, return to the pool when it’s not needed any more. That is where reference counting comes into play and where Rust default structures don’t help. Frame pool owns the reference and decoder gets a second copy. And Rust Arc is intended for single ownership: when you try to access the shared object it will simply clone it so you end up working with a copy (which defies the purpose). So I had to NIH my own NABufferRef<T> which keeps reference counts and still allows shared access even for writing (currently it does that in all cases but if I need to add some guards the API won’t have to be changed for that). The implementation is very simple: the structure contains a raw pointer to a structure that contains actual object and AtomicUsize counter. The whole implementation is ~2.2kB relying just on std crate.

And finally I’ve made a picture pool. The difference between picture and frame is all additional metadata picture should not care about (like timestamps, stream information and such). Because of the design decisions I have three different picture formats (implemented for 8-, 16- and 32-bit element sizes, Rust does not like aliasing after all), which means I need to provide decoder with all three picture pools because we can’t say in advance which one codec will use (if at all—the option to allocate new non-pooled pictures is still there). Also I want to keep those pools external in case the code around it wants to do keep more pictures in it (e.g. 2-3 pictures required by decoder and 25 pictures pre-buffered for the display). This resulted in a structure called NADecoderSupport that contains picture pools and may have something else added late. Of course people might argue that it’s much better to have AVCodecContext with a myriad of fields you can set directly or via utility functions but I’d rather not have one single structure. Though it might be a good place to put various decoder options there (so that decoder can ignore them at its leisure).

Since I said I did it to increase efficiency I should probably give some numbers too: RealVideo 3/4/6 decoders now use buffer pool (for three frames obviously) and reallocate it on format change. Decoding time got reduced by 4-5% from using the pool. Currently I don’t care about speed much but I may convert more decoders to it if the need arises.

In conclusion I want to say that even I did not enjoy doing that work much, it was needed and gave me some experience plus some improvements in code and design. So it was not a wasted effort.

P.S. I also installed rust-clippy since it’s in stable now and tried to fix errors and warnings it reported. But that is a story for another post.

Zähringerstädte

May 6th, 2019

Today I want to talk about local dynasty that was rather short-lived but left quite an impressive legacy.
Read the rest of this entry »