Archive for the ‘Game Video’ Category

A Quick Review of Actimagine Video Codecs

Sunday, August 23rd, 2020

Now that (as I believe) I’ve fixed remaining reconstruction bugs in VX decoder, why not do a quick comparison of various video codecs developed by Actimagine and see how they differ (if at all).

There seem to be the following codecs:

  • Actimagine (VX)
  • Mobiclip (Mods)
  • Mobiclip (Moflex for 3DS also there’s a version of it for PC known as Mobiclip HD)

And while they all are based on H.264 with finer block partitioning, there are some differences as well.

Proper structure. The original VX codec used quantiser derived from FPS and all frames were encoded in the same way, while the latter codecs have I-frames and quantisers are transmitted for each frame (as delta for non-keyframes).

References and motion compensation. VX had three previous frames as reference ones, later codecs increased that number to five. VX had fullpel motion compensation, later codecs use halfpel MC.

Data coding. VX relied on Elias Gamma’ codes for all codes except coefficient coding, later codecs use codebooks for most coded values. Also while VX coded residue in 4×4 blocks in H.264 way (starting from the end and with tail of ones coded explicitly), newer codecs use separable transforms and the usual (zero run, coefficient level) coding. Additionally only nine coding modes out of twenty four have survived after VX (intra prediction, MC with motion vectors coded and splits).

Overall, while all those codecs are related, there are large differences between VX and later Mobiclip variants and the only differenced between Mobiclip variants are colourspace (Mods uses YCoCg model, HD uses the proper YUV model), quantiser being clipped to 12-52 range, and block mode codebooks being different.

As I mentioned before, somebody has reverse-engineered decoders for Mobiclip (and a quick check on codebooks used tells me that Mobiclip HD and 3DS versions are the same) so if somebody needs them it should not be that hard to write a decoder.

A look at some old game

Wednesday, August 19th, 2020

Sometimes I like to play old strategy games from my youth: Civilization II, Settlers II, WarCraft II and Reunion. You probably have never heard about it since it’s not from some famous studio but from some Hungarians and published by rather obscure publisher too.

The idea is about the same as in Settlers II but IN SPACE! In some near future an experimental spaceship somehow gets into an unknown star system, most of technologies are lost and now you have to colonise planets, fight with aliens and find your way back home. This game combines some planet-building with space exploration and ground battles (there are also battles in space but they’re fought without your involvement). And since it has a story you have events like getting a chance to get some technology or break the alliance between your enemies. So it’s an interesting mix overall and it explains why I still return to it time from time. Sadly the game was programmed in traditional Hungarian manner (remember, Hungarians are responsible for such popular software as Windows 95 or MPlayer) and its intro (a separate program) sometimes crashes and sometimes it even makes DosBox segfault. The main game is also prone to corruptions and crashes (yet I still play it sometimes).

Anyway, today I’ve stumbled upon a page of one guy who reverse-engineered image format used in this game just by fiddling with it. It turned out to be compressed with RLE similar to the one used in PCX (0x00-0xBF – normal pixel, 0xC0-0xFF – run of next byte value 0-63 times). Since the game had some animations as well I decided to look at them.

So intro uses mostly still images split into 640×100 strips (so they can fit into one segment if you remember those) that are scrolled and faded in and out. And there’s a special animation format for some in-game animations similar to the picture format (as expected). Animation file is a series of frames (without palette) that are coded with similar RLE but there are some quirks not encountered in still images. First of all, frames are coded as differences and codes in range 0x80-0xBF are used to signal how many pixels to skip. Second, it turns out that codes 0x80 and 0xC0 are actually escape codes and are followed by 16-bit value of actual skip or run length (and in case of 0xC0 code a pixel value after that). Again, since the format is so simple it could be found just by looking inside the animation files and messing with a decoder.

As for the other games mentioned in the beginning, Civ2 has GIF files mostly hidden inside resource .dlls plus Indeo 4 video (with transparency even!) and Settlers II and WarCraft II have videos in Smacker format.

Having said that, my pointless diversion to looking at game formats is over, back to doing nothing!

Actimagine VX: another imperfect decoder

Thursday, August 13th, 2020

So I’ve released my decoder for Actimagine VX and it’s far from perfect.

First problem is audio. While the codec itself it not that tricky (it turned out to be some LPC codec that takes 5-10 16-bit words per frame to code pulses and filter for 128-sample frame), but its data is stored right after video frame data so in order to decode audio first you need to decode video frame and feed the remains of input buffer to the audio decoder. Since I can’t do that in a sane way I could not test the decoder either and it’s there just for the informative purposes only.

The second problem is obviously video. I’ve managed to decode bitstream fine but reconstructed images are not bit-exact and in case of plane prediction this leads to ugly artefacts (essentially the target value wraps around and you have gradients from white to black or vice versa instead of almost flat dark or white regions). I’ve introduced a clipping which seems to help but this is not right and maybe I’ll fix it one day. Maybe even before Bink2.

And finally there are some problems with the demuxer. In theory VX files may have multiple tracks but my demuxer might not handle them at all and if it does then it’ll simply ignore anything but the first video stream.

So VX support is far from perfect but it serves its goal of proving that the format works as expected. And if it’s useful to anybody then it’s even better.

Some words about Bink2

Sunday, August 9th, 2020

As you may know (but definitely not care), NihAV has some limited support for Bink2 video. The problem in fixing it is that known samples are usually 720p video or mode which makes it hard to debug decoding past few initial frames (okay, older versions have smaller known videos so they’re likely to be fixed sooner). And of course the encoder is available only to the RAD customers to which I don’t belong. So in result I’ve decided to look at Actimagine VX codec once again.

I’ve looked at it four years ago but I could just study it but not write a decoder because of the binary. Essentially this codec happens on BigN DS consoles so you have to deal with raw ARM7 or ARM9 binary that (as it turns out) sets up its own segments (and the problems arise when you see absolute addresses to the areas not present there). So you load binary at addresses e.g. 0x2000000-0x20e1030 but in reality it contains also segments 0x1ffe800-0x1fff000 and 0x27e0000-0x27e4000. Thankfully Ghidra can not just load raw ARM binary but also add aliases to data as new segments. This allowed me to work on the decoder again and now I have more or less complete understanding of it and semi-working decoder for it as well, here’s an example:

Sample decoded frame.

Essentially it’s a simplified variant of H.264 with the following features: frames are split into 16×16 macroblocks that can be further recursively divided horizontally or vertically down to 2×2 blocks. Block can be coded in 24 different modes that boil down to full-pel motion compensation from one of three previous frames (without a motion vector, with motion vector, or with motion vector and an offset value that should be added to each pixel), intra prediction on whole block or intra prediction in 4×4 blocks. Also whether you have residue coded is also part of the mode (e.g. mode 11 is intra prediction without residue and mode 22 is intra prediction with residue). Residue is coded in 8×8 blocks comprising six 4×4 coefficient blocks, each block is coded in a way reminding of H.264: there are numbers for total number of non-zero coefficients, number of last non-zero coefficients being plus-minus one and number of zeroes dispersed between non-zero coefficients. Those being coded with variable-length codes that I could not access earlier was the blocker but not any more.

And there’s one curious feature of this codec that made it worth REing: instead of using plane prediction like H.264, this codec fills block in a recursive way. It interpolates bottom-right corner as an average of top-right and borrom-left neighbour pixels (e.g. [15,-1] and [-1,15] for 16×16 block; it also adds a delta to it in certain decoding modes), then it calculates halfway-bottom right and halfway-right bottom pixels (e.g. [15,7] and [7,15] for 16×16 block), then a centre pixel, and then repeats the process for each quarter (or half for some rectangular blocks). This is less computationally intensive than ordinary plane prediction and it seems to give nice results too.

I mentioned before that my decoder is far from perfect (and you can see it for yourself on that picture) but I know how to debug and improve it. I’m not trying to say that piracy is okay, but being able to find some .nds image with a game that has VX videos and using it with DeSmuME with GDB stub would help to debug the decoder but piracy is bad and so it’s not a proper way to do things.

As for audio counterpart, I should mention this: curiously enough there’s an opensource decoder for later MobiClip formats that seems to contain working Sx decoder for an audio used in VX files (it’s a pity the person who did it could not finish VX as well—why should I do the work myself instead of letting other people do my work for me?!). Unfortunately it’s mostly translated assembly so while it should work it’s mostly sub_XXX() doing various accesses to various positions of large byte array of decoder state. I’ll probably add it as well for completeness sake and document the formats properly after I fix the decoder (which should happen during this year too).

Monty Python & Quest for Holy Grail documented

Tuesday, May 19th, 2020

I’ve finally had time and documented my findings so if somebody is interested in it he can have at least something to start with.

Better VMD Support in NihAV

Thursday, April 16th, 2020

As the certain doctor from ScummVMTrek reminded me, the VMD format (developed by Coktel Vision that was bought by Sierra) was used in their own games too (and even more so, there VMDs were used for many animations where simple sprite would suffice as well) and they kept making some educational games way into 2000s. So I decided to look at those as well.

The Last Dynasty (which I’ve never played and unlikely to ever play) features VMD that can be decoded mostly fine except that in 320×161 video you sometimes have sudden 640×322 frames near the end of video.

Some education game from Adi 4.0 generation. This one has some videos in 15-bit RGB format plus IMA ADPCM compressed audio track. But it can’t beat the weirdness of…

Urban Runner. Here we have a mix for 15-bit RGB VMD, 24-bit RGB VMD and VMDs with Indeo 3 video. And if you thought this was simple enough here’s another fun trick for you. Despite having different depths, all non-Indeo3 VMDs use the same compression methods, just in some cases the buffer should be interpreted as bytes, sometimes as 16-bit little-endian words and sometimes like triplets. So far so good. But they had a bright idea of sometimes storing the image dimensions in pixels and sometimes in bytes. In result I look at VMD header to see what flavour it has there to see if I need to scale frame dimensions by bytes per pixel before decoding or not. And this game also features some videos that have 312×136 resolution except that the last frame is 624×272 (I had to allow my Indeo 3 decoder to change dimensions to handle that particular case).

At least it could be done mostly by guesswork (except for audio, I had to look into ADI4.exe using Ghidra to find out that it has IMA ADPCM now) and all files can be decoded fine now.

If somebody can provide me with some samples and binary for their latest generation I’d look at it as well.

Going to the 7th Level for the Grail!

Saturday, September 28th, 2019

I’ve spent two days on something that I wanted to do long time ago but postponed because of various reasons including complexity. But now thanks to Ghidra I’ve finally managed to pry into resources of Monty Python and the Quest for the Holy Grail and decode videos (and more!) stored there.

Probably it was VP7 (again) that made me look into it, but since Ghidra has decompiler for 16-bit code as well, I’ve tried it on libraries of surprisingly small game engine (pity that ScummVM does not even plan to support it but I’m in a similar situation with codecs). And what do you know, they have a special library dealing with resource files that included unpacking images (but not audio—there’s audio library for that).

First, let’s talk about resource file organisation and why it baffled me for too long. The files are organised into several chunks: header that contains offsets of the other chunks at the end of it (around bytes 0xE2..0x141), then you have actual table of contents (more about it later), then some small binary data chunk, then list of strings related to file names, then some chunk that looks like game script (in binary form), then palette and finally another list of strings that looks like variables list. It is no wonder I could not do anything useful with it since actual table of contents is hidden somewhere near the end of file but not exactly at the end. Also each game scene corresponds to its own archive which makes it clearer what to expect there (previous version used in Monty Python’s Complete Waste of Time even has a description right in the header and table of contents stored right after it, I might look at it later but no promises).

Now, files in the resource archive. The catalogue has only file type, its size and offset in the archive and nothing else. There are more than a dozen file types, 1 being images (both background and sprites), 2 being movies (the thing I was hunting), 4 is for MIDI tracks, 9 is for digitised audio, the rest I don’t care about.

Let’s move to simple things like music and audio. Music is stored in its custom format with four bytes per command except when high bit is set (then it’s delta time for next command). Why so? Probably because it’s being played by sending single MIDI commands and at least on Windows it requires packing them into 32-bit word anyway. Audio is stored in files with some header and either raw form or with IMA ADPCM compression. The notable thing is that it does not use any multiplications in any form—instead it has precomputed table for all eight possible values for each step.

Images are another interesting topic. First of all, palette is stored as 944-byte file with 32-bit entries (so it’s just 236 colours) but somehow decoded files use it with bias of ten, i.e. decoded index 13 corresponds to palette entry 3, index 27 is for entry 17 etc etc. I have no idea what it does with first colours in the palette (though sometimes images are not colorised properly so maybe they need different palette bias or a different palette at all).
Second, images employ two different compression schemes: RLE and LZW. RLE is rather trivial except that it uses opcode zero to signal end of line (and double zero for end of image):

Obvious words skipped

LZW-compressed image splits image into chunks that may be uncompressed or compressed with LZW using 10, 11 or 12 bits per index. And after all these years I was still able to correctly guess it was LZW and write a decoder that worked fine without resorting to any documentation or even studying the decompiled code (I just needed to realize that it reads sequences of n-bits indexes and does something with them to output bytes).

And finally the movie format. Those are actually stored in two files—movie data and companion frame index (i.e. frame type, size and offset). After extracting frames I could not realize what to do with them until I noticed that frame type 2 all have the constant size of 11025 except for first few. Obviously they turned out to be IMA ADPCM compressed audio frames (and since they were the first frames in every movie it was hard to guess the format looking at semi-random data without header). Frame type 0 turned out to be the same image format as normal pictures.

Frame type 1 turned out to be inter-frames that use index 0 for pixels that should not be updated.

Unfortunately even if I now have a way to decode the files I cannot add direct support for them in NihAV since it requires at least three different files to play it (palette, frame index and frame data). At least now I know I can transcode them into something when the need arises.

Now let’s talk about Ghidra a bit. Without it this probably would have not happened since I’m not that young to spend time on translating tons of disassembly (and REC failed to do a good job). Ghidra support for 16-bit code is far from perfect with all that mess with far pointers consisting of two 16-bit words, external DLL functions linked by ordinals (so I had to match them by hoof and rename where possible) and general confusion with int treated as 4-byte in some contexts but 2-byte in another. And yet it did the job and helped me to understand how the game libraries work and that’s what really counts.

And as a bonus here’s a team photo extracted from concentration mini-game related to the witch scene (not the Simon Says clone with fires but probably Match Two clone that I’ve never seen in-game before):

MidiVid codec family

Thursday, September 26th, 2019

VP7 is such a nice codec that I decided to distract myself a little with something else. And that something else turned out to be MidiVid codec family. It turned out to be quite peculiar and somehow reminiscent of Duck codecs.

The family consists of three codecs:

  1. MidiVid — the original codec based on LZSS and vector quantisation;
  2. MidiVid Lossless — exactly what is says on a tin, based on LZSS and bunch of other technologies;
  3. MidiVid 3 — a codec based on simplified integer DCT and single codebook for all values.

I’ve actually added MidiVid decoder to NihAV because it’s simple (two hundred lines including boilerplate and tests) and way more fun than working on VP7 decoder. Now I’ll describe them and hopefully you’ll understand why it reminds me of Duck codecs despite not being similar in design.

MidiVid

This is a simple hold-and-modify video codec that had been used in some games back in PS2/Xbox era. The frame data can be stored either unpacked or packed with LZSS and it contains the following kinds of data: change mask for 8×8 blocks (in case of interframe—if it’s zero then leave block as is, otherwise decode new data for it), 4×4 block codebook data (up to 512 entries), high bits for 9-bit indices (if we have 257-512 various blocks) and 8-bit indexes for codebook.

The interesting part is that LZSS scheme looked very familiar and indeed it looks almost exactly like lzss.c from LZARI author (remember that? I still do), the only differences is that it does not use pre-filled window and flags are grouped into 16-bit word instead of single byte.

MidiVid Lossless

This one is a special best as it combines two completely different compression methods: the same LZSS as before and something used by BWT-based compressor (to the point that frame header contains FTWB or ZTWB IDs).

I’m positively convinced it was copied from some BTW-based compressor not just because of those IDs but also because it seems to employ the same methods as some old BTW-based compressor except for the Burrows–Wheeler transform itself (that would be too much for the old codecs): various data preprocessing methods (signalled by flags in the frame header), move-to-front coding (in its classical 1-2 coding form that does not update first two positions that much) plus coding coefficients in two groups: first just zero/one/large using order-3 adaptive model and then values larger than one using single order-1 adaptive model. What made it suspicious? Preprocessing methods.

MVLZ has different kinds of preprocessing methods: something looking like distance coding, static n-gram replacement, table prediction (i.e. when data is treated as series of n-bit numbers and the actual numbers are replaced with the difference between previous ones) and x86 call preprocessing (i.e. that trick when you change function call address from relative into absolute for better compression ratio and then undo it during decompression; known also as E8-preprocessing because x86 call opcode is E8 <32-bit offset> and it’s easy to just replace them instead of adding full disassembler to the archiver). I had my suspicions as n-gram replacement (that one is quite stupid for video codecs and it only replaces some values with some binary values that look more related to machine code than video) but the last item was a dead give-away. I’m pretty sure that somebody who knows open-source BWT compressors of late 1990s will probably recognize it even from this description but sadly I’ve not been following it that closely being more attracted to multimedia.

MidiVid 3

This codec is based on some static codebook for packing all values: block types, motion vectors and actual coefficients. Each block in macroblock can be coded with one of four modes: empty (fill with 0x80 in case of intra), DC only, just few coefficients DCT, and full DCT. As usual various kinds of data are grouped and coded as single array.

Motion compensation is full-pixel and unlike its predecessor it operates in YUV420 format.


This was an interesting detour but I have to return back to failing to start writing VP7 decoder.

P.S. I’ll try to document them with more details in the wiki soon.
P.P.S. This should’ve been a post about railways instead but I guess it will have to wait.

Bink-b: Encoder

Friday, May 31st, 2019

Recently I’ve been contacted by some guy working on a mod campaign for Heroes of Might and Magic III. The question was about the encoder for videos there. And since the original one is not likely to exist, I just wrote a simple one that would take PGMYUV image sequence and encode it. Here’s the gzipped source.

It took a couple of evenings to do that mostly because I still have weak symptoms of creeping perfectionism (thankfully it’s treated with my laziness). BIKb does not have Huffman-coded bundles, so the simplest straightforward encoding would be: write block type bundle (13-bit size and 4-bit elements), write empty other bundles, write several bundles containing pixels and you’re done. There’s a proper approach: write a full-featured encoder that takes input in several formats and that encodes using all possible features selecting the best quality for the target bitrate. There’s a hacky approach—translate later versions of Bink into BIKb (and then you remember that it has different motion compensation scheme so this approach won’t work). I’ve chosen something simple yet with some effectiveness: write an encoder that employs only vector quantisation and motion compensation for non-overlapped blocks plus add a quality setting so users can play with output size/quality if they really need it.

So how does the block encoding work? Block truncation coding, the fast and good way to quantise block into two colours (many video codecs back in the day used it and only some dared to use vector quantisation for more than two different values per block). Essentially you just calculate average pixel value and select two values depending on how many pixels in the block are larger than average and by how much they deviate. And here’s where quality parameter comes into play: depending on it encoder sets the threshold above which block is coded as is (aka full mode) instead as two colours and pattern in which they occur (of course if it’s a solid-colour fill it’s always coded as such). As I said, it’s simple but quite effective. Motion compensation is currently lossless i.e. encoder will try to find only the block that matches exactly (again, it can be improved but that would only lead to longer implementation times and even longer debugging times). This makes me appreciate the work on Smacker and Bink 1 video codecs and encoders for them even more.

Overall, it was a nice diversion from implementing Duck decoders for NihAV but I should probably return to it. The sooner it’s done the sooner I can move to something more exciting like finally experimenting with vector quantisation, or trying to write a player, or something else entirely. I avoid making plans but there are many possibilities at hoof so I just need to pick one.

Bink2: some words about loop filter

Sunday, April 14th, 2019

Since obviously I have nothing better to do, here’s a description of loop filter in Bink2 as much as I understand it (i.e. not much really).

First, the loop filter makes decision on two factors: motion vector difference between adjacent blocks is greater than two or it selects filter strength depending on number of coefficients coded in the block (that one I don’t remember seeing before). The filter is the same in all cases (inter/intra, luma/chroma, edge/inside macroblock), only the number of pixels filtered varies between zero and two on each side (more on that later). This is nice and elegant design IMO.

Second, filtering is done after each macroblock, horizontal edges first, vertical edges after that—but not necessarily for all macroblocks. Since BIKi or BIKj encoder can signal “do not deblock macroblocks in these rows and columns” by transmitting set of flags for columns and rows.

Third, in addition to normal filtering decoder can do something that I still don’t understand but it looks like whole-block overlapping in both directions (and it is performed in actual decoding but I don’t know what happens with the result of it).

And the filter itself is not that interesting (assuming we filter buf[0] buf[1] | buf[2] buf[3]:

    diff0 = buf[2] - buf[1] + 8 >> 4;
    diff1 = diff0 * 4 + 8 >> 4;
    if (left_strength >= 2)
        buf[0] = clip8(buf[0] + diff0);
    if (left_strength >= 1)
        buf[1] = clip8(buf[1] + diff1);
    if (right_strength >= 1)
        buf[2] = clip8(buf[2] - diff1);
    if (right_strength >= 2)
        buf[3] = clip8(buf[3] - diff0);

Strength is determined like this: 0 — more than 8 coefficients coded, 1 – MV difference or 4-7 coefficients coded, 2 — 1-3 coefficients coded, 3 — no coefficients coded.

Overall, the loop filter is nice and simple if you ignore the existence of some additional filter functions and very optimised implementation that is not that much fun to untangle.

Update: the alternative function seems to be some kind of block reconstruction based on DCs. In case it’s intra block with less than four coefficients coded it will take all neighbouring DCs, select those not differing by more than a frame-defined threshold and smooth the differences. I still don’t understand its purpose in full though.