Archive for the ‘Game Video’ Category

Going to the 7th Level for the Grail!

Saturday, September 28th, 2019

I’ve spent two days on something that I wanted to do long time ago but postponed because of various reasons including complexity. But now thanks to Ghidra I’ve finally managed to pry into resources of Monty Python and the Quest for the Holy Grail and decode videos (and more!) stored there.

Probably it was VP7 (again) that made me look into it, but since Ghidra has decompiler for 16-bit code as well, I’ve tried it on libraries of surprisingly small game engine (pity that ScummVM does not even plan to support it but I’m in a similar situation with codecs). And what do you know, they have a special library dealing with resource files that included unpacking images (but not audio—there’s audio library for that).

First, let’s talk about resource file organisation and why it baffled me for too long. The files are organised into several chunks: header that contains offsets of the other chunks at the end of it (around bytes 0xE2..0x141), then you have actual table of contents (more about it later), then some small binary data chunk, then list of strings related to file names, then some chunk that looks like game script (in binary form), then palette and finally another list of strings that looks like variables list. It is no wonder I could not do anything useful with it since actual table of contents is hidden somewhere near the end of file but not exactly at the end. Also each game scene corresponds to its own archive which makes it clearer what to expect there (previous version used in Monty Python’s Complete Waste of Time even has a description right in the header and table of contents stored right after it, I might look at it later but no promises).

Now, files in the resource archive. The catalogue has only file type, its size and offset in the archive and nothing else. There are more than a dozen file types, 1 being images (both background and sprites), 2 being movies (the thing I was hunting), 4 is for MIDI tracks, 9 is for digitised audio, the rest I don’t care about.

Let’s move to simple things like music and audio. Music is stored in its custom format with four bytes per command except when high bit is set (then it’s delta time for next command). Why so? Probably because it’s being played by sending single MIDI commands and at least on Windows it requires packing them into 32-bit word anyway. Audio is stored in files with some header and either raw form or with IMA ADPCM compression. The notable thing is that it does not use any multiplications in any form—instead it has precomputed table for all eight possible values for each step.

Images are another interesting topic. First of all, palette is stored as 944-byte file with 32-bit entries (so it’s just 236 colours) but somehow decoded files use it with bias of ten, i.e. decoded index 13 corresponds to palette entry 3, index 27 is for entry 17 etc etc. I have no idea what it does with first colours in the palette (though sometimes images are not colorised properly so maybe they need different palette bias or a different palette at all).
Second, images employ two different compression schemes: RLE and LZW. RLE is rather trivial except that it uses opcode zero to signal end of line (and double zero for end of image):

Obvious words skipped

LZW-compressed image splits image into chunks that may be uncompressed or compressed with LZW using 10, 11 or 12 bits per index. And after all these years I was still able to correctly guess it was LZW and write a decoder that worked fine without resorting to any documentation or even studying the decompiled code (I just needed to realize that it reads sequences of n-bits indexes and does something with them to output bytes).

And finally the movie format. Those are actually stored in two files—movie data and companion frame index (i.e. frame type, size and offset). After extracting frames I could not realize what to do with them until I noticed that frame type 2 all have the constant size of 11025 except for first few. Obviously they turned out to be IMA ADPCM compressed audio frames (and since they were the first frames in every movie it was hard to guess the format looking at semi-random data without header). Frame type 0 turned out to be the same image format as normal pictures.

Frame type 1 turned out to be inter-frames that use index 0 for pixels that should not be updated.

Unfortunately even if I now have a way to decode the files I cannot add direct support for them in NihAV since it requires at least three different files to play it (palette, frame index and frame data). At least now I know I can transcode them into something when the need arises.

Now let’s talk about Ghidra a bit. Without it this probably would have not happened since I’m not that young to spend time on translating tons of disassembly (and REC failed to do a good job). Ghidra support for 16-bit code is far from perfect with all that mess with far pointers consisting of two 16-bit words, external DLL functions linked by ordinals (so I had to match them by hoof and rename where possible) and general confusion with int treated as 4-byte in some contexts but 2-byte in another. And yet it did the job and helped me to understand how the game libraries work and that’s what really counts.

And as a bonus here’s a team photo extracted from concentration mini-game related to the witch scene (not the Simon Says clone with fires but probably Match Two clone that I’ve never seen in-game before):

MidiVid codec family

Thursday, September 26th, 2019

VP7 is such a nice codec that I decided to distract myself a little with something else. And that something else turned out to be MidiVid codec family. It turned out to be quite peculiar and somehow reminiscent of Duck codecs.

The family consists of three codecs:

  1. MidiVid — the original codec based on LZSS and vector quantisation;
  2. MidiVid Lossless — exactly what is says on a tin, based on LZSS and bunch of other technologies;
  3. MidiVid 3 — a codec based on simplified integer DCT and single codebook for all values.

I’ve actually added MidiVid decoder to NihAV because it’s simple (two hundred lines including boilerplate and tests) and way more fun than working on VP7 decoder. Now I’ll describe them and hopefully you’ll understand why it reminds me of Duck codecs despite not being similar in design.

MidiVid

This is a simple hold-and-modify video codec that had been used in some games back in PS2/Xbox era. The frame data can be stored either unpacked or packed with LZSS and it contains the following kinds of data: change mask for 8×8 blocks (in case of interframe—if it’s zero then leave block as is, otherwise decode new data for it), 4×4 block codebook data (up to 512 entries), high bits for 9-bit indices (if we have 257-512 various blocks) and 8-bit indexes for codebook.

The interesting part is that LZSS scheme looked very familiar and indeed it looks almost exactly like lzss.c from LZARI author (remember that? I still do), the only differences is that it does not use pre-filled window and flags are grouped into 16-bit word instead of single byte.

MidiVid Lossless

This one is a special best as it combines two completely different compression methods: the same LZSS as before and something used by BWT-based compressor (to the point that frame header contains FTWB or ZTWB IDs).

I’m positively convinced it was copied from some BTW-based compressor not just because of those IDs but also because it seems to employ the same methods as some old BTW-based compressor except for the Burrows–Wheeler transform itself (that would be too much for the old codecs): various data preprocessing methods (signalled by flags in the frame header), move-to-front coding (in its classical 1-2 coding form that does not update first two positions that much) plus coding coefficients in two groups: first just zero/one/large using order-3 adaptive model and then values larger than one using single order-1 adaptive model. What made it suspicious? Preprocessing methods.

MVLZ has different kinds of preprocessing methods: something looking like distance coding, static n-gram replacement, table prediction (i.e. when data is treated as series of n-bit numbers and the actual numbers are replaced with the difference between previous ones) and x86 call preprocessing (i.e. that trick when you change function call address from relative into absolute for better compression ratio and then undo it during decompression; known also as E8-preprocessing because x86 call opcode is E8 <32-bit offset> and it’s easy to just replace them instead of adding full disassembler to the archiver). I had my suspicions as n-gram replacement (that one is quite stupid for video codecs and it only replaces some values with some binary values that look more related to machine code than video) but the last item was a dead give-away. I’m pretty sure that somebody who knows open-source BWT compressors of late 1990s will probably recognize it even from this description but sadly I’ve not been following it that closely being more attracted to multimedia.

MidiVid 3

This codec is based on some static codebook for packing all values: block types, motion vectors and actual coefficients. Each block in macroblock can be coded with one of four modes: empty (fill with 0x80 in case of intra), DC only, just few coefficients DCT, and full DCT. As usual various kinds of data are grouped and coded as single array.

Motion compensation is full-pixel and unlike its predecessor it operates in YUV420 format.


This was an interesting detour but I have to return back to failing to start writing VP7 decoder.

P.S. I’ll try to document them with more details in the wiki soon.
P.P.S. This should’ve been a post about railways instead but I guess it will have to wait.

Bink-b: Encoder

Friday, May 31st, 2019

Recently I’ve been contacted by some guy working on a mod campaign for Heroes of Might and Magic III. The question was about the encoder for videos there. And since the original one is not likely to exist, I just wrote a simple one that would take PGMYUV image sequence and encode it. Here’s the gzipped source.

It took a couple of evenings to do that mostly because I still have weak symptoms of creeping perfectionism (thankfully it’s treated with my laziness). BIKb does not have Huffman-coded bundles, so the simplest straightforward encoding would be: write block type bundle (13-bit size and 4-bit elements), write empty other bundles, write several bundles containing pixels and you’re done. There’s a proper approach: write a full-featured encoder that takes input in several formats and that encodes using all possible features selecting the best quality for the target bitrate. There’s a hacky approach—translate later versions of Bink into BIKb (and then you remember that it has different motion compensation scheme so this approach won’t work). I’ve chosen something simple yet with some effectiveness: write an encoder that employs only vector quantisation and motion compensation for non-overlapped blocks plus add a quality setting so users can play with output size/quality if they really need it.

So how does the block encoding work? Block truncation coding, the fast and good way to quantise block into two colours (many video codecs back in the day used it and only some dared to use vector quantisation for more than two different values per block). Essentially you just calculate average pixel value and select two values depending on how many pixels in the block are larger than average and by how much they deviate. And here’s where quality parameter comes into play: depending on it encoder sets the threshold above which block is coded as is (aka full mode) instead as two colours and pattern in which they occur (of course if it’s a solid-colour fill it’s always coded as such). As I said, it’s simple but quite effective. Motion compensation is currently lossless i.e. encoder will try to find only the block that matches exactly (again, it can be improved but that would only lead to longer implementation times and even longer debugging times). This makes me appreciate the work on Smacker and Bink 1 video codecs and encoders for them even more.

Overall, it was a nice diversion from implementing Duck decoders for NihAV but I should probably return to it. The sooner it’s done the sooner I can move to something more exciting like finally experimenting with vector quantisation, or trying to write a player, or something else entirely. I avoid making plans but there are many possibilities at hoof so I just need to pick one.

Bink2: some words about loop filter

Sunday, April 14th, 2019

Since obviously I have nothing better to do, here’s a description of loop filter in Bink2 as much as I understand it (i.e. not much really).

First, the loop filter makes decision on two factors: motion vector difference between adjacent blocks is greater than two or it selects filter strength depending on number of coefficients coded in the block (that one I don’t remember seeing before). The filter is the same in all cases (inter/intra, luma/chroma, edge/inside macroblock), only the number of pixels filtered varies between zero and two on each side (more on that later). This is nice and elegant design IMO.

Second, filtering is done after each macroblock, horizontal edges first, vertical edges after that—but not necessarily for all macroblocks. Since BIKi or BIKj encoder can signal “do not deblock macroblocks in these rows and columns” by transmitting set of flags for columns and rows.

Third, in addition to normal filtering decoder can do something that I still don’t understand but it looks like whole-block overlapping in both directions (and it is performed in actual decoding but I don’t know what happens with the result of it).

And the filter itself is not that interesting (assuming we filter buf[0] buf[1] | buf[2] buf[3]:

    diff0 = buf[2] - buf[1] + 8 >> 4;
    diff1 = diff0 * 4 + 8 >> 4;
    if (left_strength >= 2)
        buf[0] = clip8(buf[0] + diff0);
    if (left_strength >= 1)
        buf[1] = clip8(buf[1] + diff1);
    if (right_strength >= 1)
        buf[2] = clip8(buf[2] - diff1);
    if (right_strength >= 2)
        buf[3] = clip8(buf[3] - diff0);

Strength is determined like this: 0 — more than 8 coefficients coded, 1 – MV difference or 4-7 coefficients coded, 2 — 1-3 coefficients coded, 3 — no coefficients coded.

Overall, the loop filter is nice and simple if you ignore the existence of some additional filter functions and very optimised implementation that is not that much fun to untangle.

Update: the alternative function seems to be some kind of block reconstruction based on DCs. In case it’s intra block with less than four coefficients coded it will take all neighbouring DCs, select those not differing by more than a frame-defined threshold and smooth the differences. I still don’t understand its purpose in full though.

BMV: Complete!

Thursday, April 4th, 2019

So NihAV finally got Discworld Noir BMV support and I’ve tested it on all samples from the game to see if it works correctly. Here’s a sample frame:


(I still remember the song Samael plays there and have it somewhere ripped in its full MPEG audio layer II glory).

Now I want to talk about the format since it’s quite different from anything else. BMV used in Discworld II was simple but with two quirks: it employed integer coding using variable amount of nibbles (that were read as bytes but a nibble could be saved and used later) and it could decode frame either from the beginning to end or from end to beginning (reading frame data from the end too!). DW3 BMV is even stranger and let’s start with audio part. Audio codec is very simple: you have 41-byte block with one byte signalling which quantised values tables should be used for both channels and 32 indices for each channel packed into 16-bit words. The main peculiarity is that data is aligned to 16-bit and mode byte can be either in the beginning or at the end of the block. That’s a bit unusual but not strange. Well, it turns out it aligns for the absolute position in a file so my demuxer has to signal whether audio data was at even or odd position. And video is even stranger.

As I wrote previously, video codec is 16-bit now and still employs nibble variable integer coding and copy/repeat/new pixels mode. Luckily there is no backwards decoding mode yet the codec is tricky without it. First of all, where previously there were just three plain modes now we have combinations of those with bytes or nibbles signalling what should be done (i.e. copy/repeat/put new pixels fixed amount of times and then do the other operation another fixed amount of times). And they have different meaning depending on what was the last operation (copy/repeat/put for fixed amount or with arbitrary large one). And if there’s a nibble left unread after last operation or not. But that’s not all! While previously reading new pixels meant just reading a byte, 16-bit pixels can be compressed a bit more. In result we read 1-3 bytes per pixel: first read index byte, remap it, if it’s in range 00..F7 then return pixel in an array, if it’s in range F8..FE then read another byte and use it as an index in one of seven secondary “palettes”; if it’s FF then simply read explicit 2-byte value from the stream. The reference simply used an array pointing to the various functions performing this. And of course palettes can be updated in the beginning of each frame.

Surprisingly, decoder implementation takes about 28kB with a quarter of it being tables. That’s for both audio and video decoder. This is on par with other game decoders (GDV, Smacker and VMD) and feels significantly smaller than the reference (which is about 30kB in a stripped assembled .o file and over 200kB as assembly).

Overall it was hard to comprehend and tricky to debug too. Nevertheless now it’s over and I can probably move to TrueMotion 2X. Or whatever I decide to do when I’m bored enough.

BMV: moving forward

Saturday, March 30th, 2019

I’ve made some significant progress on REing Discworld Noir BMV.

First, I put opcodes meaning into table (it’s probably the only case when I had to use spreadsheet for REing) to figure out the meaning.

Normal mode of operation have these opcodes:

  • 00**00*0 — perform extra-long copy;
  • 00**00*1 — perform extra-long invoking of pixel functions;
  • 00**xxx0 — copy xxx-1 pixels;
  • 00**xxx1 — invoke pixel function xxx-1 times;
  • xxxx000y — copy xxxx+3+y pixels;
  • xxxx001y — invoke pixel function xxxx+3+y times;<
  • xxx0yyy0 — copy yyy-1 pixels and then invoke pixel function xxx-3 times;
  • xxx0yyy1 — invoke pixel function yyy-1 times and then repeat last value xxx-3 times;
  • xxx1yyy0 — copy yyy-1 pixels and then repeat last value xxx-3 times;
  • xxx1yyy1 — invoke pixel function yyy-1 times and then copy xxx-3 pixels.

Then depending on last operation performed mode is changed to: something special for 00****** opcodes, no change for repeat, “after copy mode” and “after pixel func mode” for obvious cases.

After copy mode opcodes:

  • xxxxxxx0 — invoke normal mode opcode xxxxxxx1;
  • 00**00** — extended repeat;
  • 00**xxx1 — repeat last value xxx-1 times;
  • xxxx00y1 — repeat last value xxxx+3+y times;
  • xxx0yyy0 — repeat last value yyy-1 times and copy xxxx-3 pixels;
  • xxx0yyy1 — repeat last value yyy-1 times and copy xxxx-3 pixels;
  • xxx1yyy1 — repeat last value yyy-1 times and invoke pixels function xxxx-3 times.

After pixel function mode is simple: xxxxxxx0 opcode is after copy mode xxxxxxx1 opcode and xxxxxxx1 opcode is normal mode xxxxxxx0 opcode.

The special modes may have secondary opcodes that usually boil down to the same thing: either do some more of the same and proceed normally or calculate next opcode instead of reading it from the stream.

And now the second thing: I’ve managed to rip the relevant code from the disassembly, fix it for NASM to handle plus added some fixed to make it possible to invoke externally and linked it against my own small program that parses BMV file, decodes the first video frame and dumps it into file. That approach works so I have something to test my new implementation against. Also since NASM makes all labels visible it’s easy to make debugger report each opcode as it gets called.

To summarise, I have good understanding of the algorithm and I have a working binary specification. This should be enough to finish it soon (unless I get distracted by something else of course).

Moving with REing game codecs

Saturday, March 23rd, 2019

As I’ve mentioned before, NihAV now can decode Bink2 files more or less decently. I don’t have many samples but I can decode all samples I could find from KB2f to KB2j quite well (the only exception is KB2a — there’s only one partial sample known with no indication which game uses it and no version of RAD tools understands it either).

I’ve omitted support for full-resolution Bink2 files (no samples) and reconstruction is not perfect because there’s an in-loop deblocking filter with some additional crazy functions invoked in some cases. It’s messy and does not affect actual bitstream decoding so I’m not going to work on it now. Maybe if I get some inspiration later…

Anyway, I moved to REing Discworld Noir BMV format instead. While the game is nice, it’s hard to run on any modern OS and there’s almost no hope for its engine being reimplemented. So maybe I’ll be able to re-watch cutscenes from it… I’ve figured out container and audio long time ago, video is not that easy. It seems to be an upgrade of Discworld II BMV that outputs 16-bit video but the way it’s implemented is baffling.

While locating the functions responsible for the decoding was easy, understanding them was hard. For the disassembler. No, the code was recognized fine but its flow makes even disassembler freak out. Here’s how it works:

  1. The frame decoding function reads pixel values, fills certain arrays and patches functions that return pixel values (nice start, isn’t it?) and then the real frame decoding starts;
  2. Frame decoding is done like a state machine (which complements coroutines used elsewhere in the engine) with several tables for handling 256 opcodes (or less in some cases);
  3. In result you read byte, jump to one of the labels, perform the operation, read the next opcode, jump using the same or different table;
  4. Except when it’s opcodes 00-3F, then you usually have to construct length word, perform some pixel output loop and then jump to another opcode handler which performs some operation and then jump to the address calculated by the previous operation;
  5. Of course pixel functions are some permuted array of 256 pointers to the functions of three different kinds: return fixed value (set by the decoding function in the beginning), read byte and return pixel value from the corresponding array or read new pixel value from the stream;
  6. And to make it all even better, all those operations are obviously not actual functions but small(er) chunks of assembly code that use fixed registers as arguments and they’re located both before and after decoding function “body”.

Anyway, I’ve made some progress and I reckon it will be possible to support this format in NihAV though maybe not soon enough.

NihAV: even more Bink2 support!

Wednesday, March 13th, 2019

After managing to decode the first frame of KB2g variant I had three options: try to decode the other frames, try to decode other variants or do nothing. While the third option is the most appealing and the first option is the most logical, I stuck with the second one. So now I can decode the first frame of KB2f variant of Bink2 as well. Unfortunately the only (partial) KB2a sample I know is not supported, probably it’s a beta version that was tried on one game like Bink version b. Beside a small surprise in one place bitstream decoding was rather simple. Inter-frame support should not be that hard but it might get messy because of the DC and MV prediction.

And while talking about REing Bink I should mention that I’ve tried Ghidra while doing KB2f work. It is a nice tool that sucks in some places (not having a good highlight for variables, decompiling SIMD code results in very questionable output, the system being Java-based and requiring recent JDK—that’s the worst issue really) it works and produces decent results (including the decompiler). Also since it has 16-bit decompiler support maybe I’ll manage to figure out how those clips in Monty Python & the Quest for the Holy Grail are stored.

I should start documenting it too.

Insignificant update: okay, now it decodes inter-frame data correctly too and the only thing left is to make it reconstruct them correctly. Also I’ve updated codec information on Multimedia Wiki. Actually now it works quite okay so I’m not going to pursue it further. I have no real interest in Bink2 decoding after all.

NihAV: some Bink2 support

Sunday, March 10th, 2019

It took a long time but finally I can decode the first frame of Bink2 video (just KB2g flavour though but it’s a start).

At least the initial observations were correct: Bink2 codes data in 32×32 macroblocks, two codebooks for AC zero runs, one codebook for motion vector components, simple codes with unary prefix for the rest.

If you wonder why it took so long—that’s because I’m lazy and spend an hour or less a day on it. Also while the codec is simple in design it’s a bit complicated in implementation. While previous version related on format sub-version to decide which feature to use, Bink2 uses frame flags to decide which feature to use. For instance, flag 0x1000 signals that there are two bit arrays coded that tell when to read an additional flag during CBP decoding that tells which one of two codebooks should be used during AC decoding later. And flag 0x2000 essentially tells to use different bitstream decoding (like motion vectors decoding or block type decoding). Or the fact that it employs DC and MV prediction that usually has four cases (top-left macroblock, top block, left block, some block inside) plus WMV1-like handling of DC prediction in inter-frames (i.e. it calculated DC for inter blocks and uses them for prediction). And of course DC prediction for inter blocks works a bit different. Plus it tries to track internal state by packing all flags into 32-bit word and updating it for each block (two bits are for signalling top row, one—for macroblock being the leftmost one, some bits are copied from frame flags etc etc). So there’s a lot of nuances to take care of.

And that’s not counting the fact that current Bink2 player can’t decode versions prior to KB2g at all. Since I have some KB2f samples along with an old Bink player that can handle them, I guess I’ll support them eventually.

A Rant on Actimagine VX Codec

Saturday, July 2nd, 2016

Well, since I don’t do anything useful these days and just give rants on subjects nobody cares about here’s another one.

This codec (there’s also VX container with IIRC PCM audio to accompany video) can be named a Very Mobile H.264 Rip-off. Why? Because the only available binary specification I got seems to come from some game, in ARMv6 I guess and has stride hardcoded to 256 pixels which would be appropriate only for some hand-held consoles. As for the second part—it uses Elias Gamma’ codes (like H.264 does—under different name) and suspiciously similar spatial prediction. Obviously it also differs a lot because it’s intended for a low-power devices and low resolutions too.

So, it operates on 16×16 macroblocks, each one can be coded with one of 24 possible modes that are really just combinations of one of the following techniques with optional residue coding:

  • splitting into 16×8 or 8×16 sub-blocks and processing them in the same way;
  • copying data from one of the previous three frames with or without motion vector adjustment (it’s full-pixel only);
  • copying data from the previous (frame?) block with some offset added to it (actually three offsets—one per component) and motion vector optionally;
  • applying intra prediction that also comes in two flavours—four modes applied to any block size or nine modes applied for 4×4 luma blocks (still only four modes for chroma).

Residual 16×16 block is coded as 5-bit CBP (again, Elias Gamma’ code mapped to CBP value) for four 4×4 luma blocks and two 4×4 chroma blocks. Coefficients are coded like this:

  1. Mode from Xine table (it’s predicted from neighbouring blocks too) that defines how many coded coefficients are there and how many of them are ones;
  2. Signs for known ones;
  3. Elias Gamma’ codes for other coefficients and their signs;
  4. Zero run value for skips between elements (from Xine table depending on maximum coefficient level seen so far).

In other words—nothing like H.264 CAVLC mode at all.

And if you think it was fun to RE I can tell you it was not and there are still challenges to overcome. First, the specification is badly written and optimised too much that decompiler is almost worthless there (for example, refilling bits is done by jumping to the end of the function that reads unsigned Elias Gamma’ code), functions expecting certain registers to be used for the state (like block functions expect R11 and R12 to contain motion vector, bitstream reading functions operate on the context stored in R1-R3 and return result in R6 etc etc), Hex-Rays also can’t decompile anything with switch statements and block decoding functions are full of them, it often decompiles function just to the first function call and ignores the rest of function code (happened to me on x86 too once where it decided to decompile only the head of main Smacker video decompression function without the block decoding loop, that’s why I trust decompilers even less that compilers). Second, the specification seems to miss data for lookup tables used in coefficients decoding.

So if you want to have it fully REd find a better specification and/or more patient and persistent man to deal with it. As for me—it’s dead, Jim®.