Archive for the ‘Game Video’ Category

A look at compressed game video formats

Friday, March 22nd, 2024

In order to distract myself from the thoughts why most politicians have their heads so deep in their asses that French ones out of all people seem to be the bravest, and what can I do to help destroying russia beside regular donations, here’s a post about completely unrelated REing work.

Since I had nothing better to do, I looked at the format use in Azrael’s Tear game and re-visited Talisman game. And while one of them is a 3D game from a British developer and another one is a 2D game from a German one, the cutscene formats used there have more in common than one would suspect (and their design distinguishes them from the majority of the formats).

It is common for various game formats to represent frames as small blocks with motion compensation or raw data, but I can’t remember any other such format that would use data compression on container level instead of it being a part of e.g. video data compression (of course there’s Ogg Matroska that implements such feature, but beside it I can’t think of any format doing that). In both of these formats (with not so creative extensions ANI and MOV) static Huffman compression is employed to compress several chunks of different data type. In ANI (used in Talisman game) data is split into groups of frames that start with a chunk defining which one of the predefined Huffman trees should be used to pack them all and to what symbols the codes should be assigned. In MOV (used in the other game, naturally) audio and video frame blocks are grouped into larger chunks and those chunks may be optionally compressed using static Huffman tree transmitted in the beginning of chunk payload.

ANI format features another peculiarity: there are other codecs that use motion compensation plus rotation (like formats from Cryo Interactive IIRC) but I can’t think about any other format that performs motion compensation and replaces one of the pixels in a block with a new value. Of course it is not a revolutionary idea but I haven’t seen it implemented like that before.

And that’s why I like those old game formats: they may be not the most effective ones but they contain more originality than the modern formats. The main problem is finding such a format—there are too many games released (which are also sometimes too hard to find) and the majority of them uses FLIC or Smacker anyway (or Cinepak in AVI or MOV for newer ones). But sometimes I encounter a mention of a game in some review and get lucky. I hope such finds happen more often though…

Looking at Motion Pixels

Tuesday, October 24th, 2023

There is this very Sirius (or Sirius Publishing, more precisely) family of video codecs (plus one container format) apparently developed by two guys (who like to spam their name even in junk sections of AVI files). Also initially it had its own container format but later they’ve started to target AVI.

Another peculiarity of this format is that initially it targeted games but later was also used as a crappy Video CD alternative.

Back in the day Gregory Montoir REd the original game format for one of the game engine re-implementations he’s famous for and donated the code to FFmpeg as well. Since that time I was curious whether that code can be adapted to play MVI1 and MVI2 as well but the codec itself turned me off.

The codec itself is perverted, both in code and interface. Also it’s inherently interlaced. Normally video codecs in AVI can be recognized by their FOURCC and pass additional configuration parameters in the additional header data. Here they decided to use half of FOURCC to pass configuration flags to the codec and use stream handler FOURCC (that most apps ignore) to tell their decoder should be used to handle it. This alone would make me want to not support it ever, but the binary specification is worse.

Looks like the code consists mostly of handwritten assembly because I don’t know which compiler may generate this madness. There are many versions of the codecs, most of them are 16-bit and the 32-bit version is no better. For starters, it uses segments.

Not so many people remember DOS times and its memory models, even fewer remember them fondly. And almost nobody remembers that in 32-bit mode you can also use FS and GS registers to have custom addressing modes. Well, this codec uses them: it sets FS to the context pointer so context fields are accessed as mov EAX, dword ptr[1A8h] while global variables are accessed as mov EAX, dword ptr GS:[SYM] and of course no decompiler likes that. I was able to work around it in Ghidra by creating a new segment starting from zero but it’s still annoying.

Another thing is (ab)using registers to the full extent. Functions pass their parameters implicitly in the registers, using stack only to save those values before a loop or form a list of rectangles to process. And of course it uses this annoying (for the decompiler) feature as using the same register for two loop counter (e.g. top byte for the outer loop and low byte for the inner loop). As the result Ghidra can’t decompile it properly or even ignores whole blocks of the code because to its belief they can’t be invoked—and it’s still better than decompiling 16-bit version of MVI1 which made decompiler commit suicide. As the result some functions are easier to hand-translate from the assembly.

In either case looks like despite all the improvements it remains about the same as the initial version: data is coded as 5-bit YUV internally and stored using Huffman codes, quantisation and change maps (rectangles that tell which areas to update/fill). MVI2 can use ten different frame decoding modes that differ in how the deltas are coded but essentially it remains the same. They have not even gotten to introducing a proper motion compensation it seems.

So, now I’ve had a good long look at the codec, found nothing interesting there that was not known before and can forget about it. If only there was something more interesting to look at…

Encoding Bink Audio

Wednesday, October 11th, 2023

As I mentioned in the introduction post, Bink Audio is rather simple: you have audio frames overlapped by 1/16th of its size with the previous and the following frame, data is transformed either with RDFT (stereo mode) or DCT-II (per-channel mode), quantised and written out.

From what I can tell, there are about three revisions of the codec: version 'b' (and maybe 'd') was RDFT-only and first two coefficients were written as 32-bit floats. Later versions shaved three bits off exponents as the range for those coefficients is rather limited. Also while the initial version grouped output values by sixteen, later versions use grouping by eight values with possibility to code a run for the groups with the same bit width.

The coding is rather simple, just quantise bands (that more or less correspond to the critical bands for human ear), select bitwidth of the coefficients groups (that are fixed-width are not related to the band widths) and code them without any special tricks. The only trick is how to quantise the bands.

Since my previous attempts to write a proper psychoacoustic model for an encoder failed, I decided to keep it simple: the encoder simply tries all possible quantisers and selects the one with the lowest value of A log2 dist+λ bits. This may be slow but it works fast enough for my (un)practical purposes and the quality is not that bad either (as much as I can be trusted on judging it). And of course it allows to control bitrate in rather natural way.

There’s one other caveat though: Bink Audio frames are tied to Bink Video frames (unless it’s newer Bink Audio only container) and thus the codec should know the video framerate in order to match it. I worked around it by introducing yet another nihav-encoder hack to set audio timebase from the video so I don’t have to provide it by hand.

So that’s it. It was a nice experiment and I hope (but not expect) to think of something equally fun to do next.

Bink encoder: coefficients coding

Tuesday, October 10th, 2023

Somewhat unrelated update: I’ve managed to verify that the output of my decoder works in the Heroes of Might and Magic III properly even with sound after I fiddled with the container flags. The only annoyance is that because of DCT discrepancies sometimes there are artefacts in the form of white or black dots but who would care about that?

At last let’s talk about the one of the most original things in the Bink Video format (and considering the rest of the things it has, that’s saying something).

Bink encoder: doing DCT

Monday, October 9th, 2023

Again, this should be laughably trivial for anybody familiar with that area but I since I lack mathematical skills to do it properly, here’s how I wrote forward DCT for inverse one.

Bink video encoder tricks

Sunday, October 8th, 2023

As I mentioned in the introductory post, there are nine block coding modes and my encoder tries them all to see which is good (or good enough) to be used. What I have not mentioned is that some of those blocks have sixteen variations (quantisers for DCT-based blocks and scan patterns for RLE blocks), which makes search even longer.

First of all, I use the rather obvious approach to trying blocks: order them in a sequence, motion blocks types first, then simple ones (fill, two-colour pattern, RLE) with intra DCT and raw being the last. And if the block metric is low enough then the block is good enough and the rest of the modes should not be tried.

Second, there are two encoding modes: quality-based and bitrate-based. For bitrate-based mode I simply manipulate lambda ratio between block distortion and bits and that’s it. For quality mode I actually collected statistics on different files at different quality settings to see what block types are used more or less. I.e. on the highest quality setting intra blocks are not used at all while on low quality settings you don’t see lossless residue or raw blocks.

So I simply used the option to disable different block coding modes (introduced to make debugging other coding modes easier) and modified the list depending on quality setting.

Then I went even further and observed the statistics of the DCT block quantisers used depending on quality settings. As one could reasonably expect, low quality setting resulting in quantisers 12-15 (and rarely 11) while high quality setting used quantisers 0-10 the most. So limiting the quantisers depending on quality was the next logical step.

And here are some other tricks:

  • on lower quality levels I use non-zero threshold for RLE block so that e.g. a sequence 4, 3, 5, 4 will be treated as a run of fours;
  • in the previous version of the encoder I used Block Truncation Coding (probably the only possible application of it even if a bit unwise), now I’m simply calculating averages of the values above/below block mean value;
  • in rate-distortion metric I scale the bits value, doing otherwise often leads to the essentially free skip block being selected almost every time and blocky moving pictures are better than slightly less blocky still one.

Of course it is easy to come with more but it’s been enough for me.

Bink bitstream bundling

Saturday, October 7th, 2023

Bink Video organises frame data somewhat on per-row basis and separated into individual streams. That means that data is sent in portions in interleaved fashion: first it is block types, then it’s colour values for e.g. fill or pattern blocks, then it’s motion values and so on. Before each row decoding can start, decoder should check if it still has some data from that stream or read more. For example, in the beginning you may get just enough block types transmitted for one row, zero motion vectors and one DC value for a block somewhere in the middle of the stream; so for the second row you refill only block types, and DC values stream will be queried only after that single value in it has been used.

And to complicate the things further, there may be yet another kind of data stored in-between: RLE block information (scan index, copy/run bit and run lengths), DCT coefficients and lossless residue data.

It’s also worth noting that while version 'b' simply stored stream data in fixed-length bit-fields, the latter versions employed static codebooks to improve compression.

Anyway, how to write such streams correctly? My original encoder supported only several block types and was able to bundle all stream data together to transmit it once for the whole plane. But since I wanted to support all possible block types, I had to devise more flexible system. So I ended up with a structure that holds all data in per-row entries so when the plane data is gathered I can easily decide how many rows of data to send and when more data needs to be sent. Additional bitstream data is also stored there (in form of (code, length) pairs) and can be written for each row that has it.

Beside that I’ve introduced a structure to all stream data for one block. This makes it easy to calculate how many bits will be used to code it, output the data (just append its contents to the corresponding entries in the plane data) and even reconstruct the block from it (except for DCT-based ones). Actually I use three instances of it (one for the best block coding candidate, one for the current block coding and one for trying block coding variants if the mode permits it) but that’s a minor detail.

In either case, even if this approach to coding is not unique (TrueMotion codecs employed it as well, to give one example), it’s distinct enough from the other variants I’ve seen and was still quite fun to implement.

Bink encoding: format and encoder designs

Friday, October 6th, 2023

Here I’m going to give a brief review of how Bink formats (container, video and audio codecs) are designed and how it affected the overall encoder design in NihAV.

Documenting failure(s) of a Bink encoder

Thursday, October 5th, 2023

Since I really had nothing better to do, I decided to try writing Bink-b encoder. Again. The previous version was a simple program to compress image sequence into a somewhat working video file, the new version supports all possible features: motion compensation, all block types including DCT-based ones, quality- and bitrate-oriented compression modes and even audio.

I worked on the encoders not because I have any real need but rather in order to both understand better how Bink codecs work and what tricks can be employed in the decoder. Bink video supports different block coding modes, from rather ordinary DCT to RLE using a custom scan order and special mode for coding motion residues. Bink audio is a simple perceptual codec that should still give me some possibilities for experimentation (of course I remember how I failed to write AAC encoder, but maybe with such simple format the outcome will be not too horrible?).

In the following posts I’ll try to describe how some things in Bink format work, what difficulties I had to overcome and what the results are.

Eventually I want to test my implementation against the original decoder (by creating a custom heroes3.vid or video.vid for HoMM3). Also it is rather easy to enhance my work to produce latter-version Bink videos but I’m not sure if it’s worth doing that as there’s a good free encoder for it already (probably a much better one than I can ever create) and Bink format is not a good candidate for normal videos as it was created for game cutscenes with the only options to keep or end playing, no option for rewinding back at random place to repeat scene you missed (or skip some boring part).

In either case, it’s still a nice experience. After all, NihAV is about experimenting and learning how stuff works and Bink offered a lot of new things to try.

NihAV: adding SGA support

Saturday, September 2nd, 2023

Since I had nothing better to do this week I decided to finally add Digital Pictures SGA decoding support to NihAV. While there are many different formats described in The Wiki, I’ve decided to play only those not described there (namely $81/$8A, $85, $86 and $89).

In my previous post on this matter I mentioned that the formats I took interest in are using 8×8 tiles that may be subdivided into 8×4 or 4×4 parts and filled with several colours using a predefined pattern (or an arbitrary one for 8×8 tile if requested) plus some bits to select one of two possible colours for each tile pixel. The main difference between $81/$8A scheme and the others is that it codes all data in the same bitstream while the later versions split colours and opcode+pattern bits into two separate partitions (maybe they had plans for compressing it?) plus they store audio data inside the frame.

And here are some notes on the games (I think most of those are PC or Macintosh ports but it’s possible the same files were used in console versions of some of those games as well):

  • Double Switch—this one uses $81 compression (in still images, cutscenes embed them along with $A2 audio in $F1 chunks);
  • Quarterback Attack—this one uses $8A compression in $F9 chunks;
  • Night Trap$85 compression and megafiles (i.e. almost all cutscenes are stored in single NTMOVIE file that require some external index to access them). Also the PC release had a short documentary about the moral panic around that game (in the same format of course; in two resolutions even);
  • Corpse Killer$86 compression and one megafile for all cutscenes;
  • Supreme Warrior$89 compression, one megafile and no frame dimensions given. For most of the cutscenes it’s 256×160 but at the end (logo and maybe something else) it’s different. Additionally there are two audio tracks: some audio chunks contain twice as much data (and have high bit of size set), in that case the first half corresponds to English speech and the second half is Chinese; otherwise it’s the same for both versions (e.g. background music, fighters grunting, sound effects and so on).

Overall, it was an interesting experience even if I don’t care about the games themselves.