Archive for the ‘Bink’ Category

Bink-b: Encoder

Friday, May 31st, 2019

Recently I’ve been contacted by some guy working on a mod campaign for Heroes of Might and Magic III. The question was about the encoder for videos there. And since the original one is not likely to exist, I just wrote a simple one that would take PGMYUV image sequence and encode it. Here’s the gzipped source.

It took a couple of evenings to do that mostly because I still have weak symptoms of creeping perfectionism (thankfully it’s treated with my laziness). BIKb does not have Huffman-coded bundles, so the simplest straightforward encoding would be: write block type bundle (13-bit size and 4-bit elements), write empty other bundles, write several bundles containing pixels and you’re done. There’s a proper approach: write a full-featured encoder that takes input in several formats and that encodes using all possible features selecting the best quality for the target bitrate. There’s a hacky approach—translate later versions of Bink into BIKb (and then you remember that it has different motion compensation scheme so this approach won’t work). I’ve chosen something simple yet with some effectiveness: write an encoder that employs only vector quantisation and motion compensation for non-overlapped blocks plus add a quality setting so users can play with output size/quality if they really need it.

So how does the block encoding work? Block truncation coding, the fast and good way to quantise block into two colours (many video codecs back in the day used it and only some dared to use vector quantisation for more than two different values per block). Essentially you just calculate average pixel value and select two values depending on how many pixels in the block are larger than average and by how much they deviate. And here’s where quality parameter comes into play: depending on it encoder sets the threshold above which block is coded as is (aka full mode) instead as two colours and pattern in which they occur (of course if it’s a solid-colour fill it’s always coded as such). As I said, it’s simple but quite effective. Motion compensation is currently lossless i.e. encoder will try to find only the block that matches exactly (again, it can be improved but that would only lead to longer implementation times and even longer debugging times). This makes me appreciate the work on Smacker and Bink 1 video codecs and encoders for them even more.

Overall, it was a nice diversion from implementing Duck decoders for NihAV but I should probably return to it. The sooner it’s done the sooner I can move to something more exciting like finally experimenting with vector quantisation, or trying to write a player, or something else entirely. I avoid making plans but there are many possibilities at hoof so I just need to pick one.

Bink2: some words about loop filter

Sunday, April 14th, 2019

Since obviously I have nothing better to do, here’s a description of loop filter in Bink2 as much as I understand it (i.e. not much really).

First, the loop filter makes decision on two factors: motion vector difference between adjacent blocks is greater than two or it selects filter strength depending on number of coefficients coded in the block (that one I don’t remember seeing before). The filter is the same in all cases (inter/intra, luma/chroma, edge/inside macroblock), only the number of pixels filtered varies between zero and two on each side (more on that later). This is nice and elegant design IMO.

Second, filtering is done after each macroblock, horizontal edges first, vertical edges after that—but not necessarily for all macroblocks. Since BIKi or BIKj encoder can signal “do not deblock macroblocks in these rows and columns” by transmitting set of flags for columns and rows.

Third, in addition to normal filtering decoder can do something that I still don’t understand but it looks like whole-block overlapping in both directions (and it is performed in actual decoding but I don’t know what happens with the result of it).

And the filter itself is not that interesting (assuming we filter buf[0] buf[1] | buf[2] buf[3]:

    diff0 = buf[2] - buf[1] + 8 >> 4;
    diff1 = diff0 * 4 + 8 >> 4;
    if (left_strength >= 2)
        buf[0] = clip8(buf[0] + diff0);
    if (left_strength >= 1)
        buf[1] = clip8(buf[1] + diff1);
    if (right_strength >= 1)
        buf[2] = clip8(buf[2] - diff1);
    if (right_strength >= 2)
        buf[3] = clip8(buf[3] - diff0);

Strength is determined like this: 0 — more than 8 coefficients coded, 1 – MV difference or 4-7 coefficients coded, 2 — 1-3 coefficients coded, 3 — no coefficients coded.

Overall, the loop filter is nice and simple if you ignore the existence of some additional filter functions and very optimised implementation that is not that much fun to untangle.

Update: the alternative function seems to be some kind of block reconstruction based on DCs. In case it’s intra block with less than four coefficients coded it will take all neighbouring DCs, select those not differing by more than a frame-defined threshold and smooth the differences. I still don’t understand its purpose in full though.

Moving with REing game codecs

Saturday, March 23rd, 2019

As I’ve mentioned before, NihAV now can decode Bink2 files more or less decently. I don’t have many samples but I can decode all samples I could find from KB2f to KB2j quite well (the only exception is KB2a — there’s only one partial sample known with no indication which game uses it and no version of RAD tools understands it either).

I’ve omitted support for full-resolution Bink2 files (no samples) and reconstruction is not perfect because there’s an in-loop deblocking filter with some additional crazy functions invoked in some cases. It’s messy and does not affect actual bitstream decoding so I’m not going to work on it now. Maybe if I get some inspiration later…

Anyway, I moved to REing Discworld Noir BMV format instead. While the game is nice, it’s hard to run on any modern OS and there’s almost no hope for its engine being reimplemented. So maybe I’ll be able to re-watch cutscenes from it… I’ve figured out container and audio long time ago, video is not that easy. It seems to be an upgrade of Discworld II BMV that outputs 16-bit video but the way it’s implemented is baffling.

While locating the functions responsible for the decoding was easy, understanding them was hard. For the disassembler. No, the code was recognized fine but its flow makes even disassembler freak out. Here’s how it works:

  1. The frame decoding function reads pixel values, fills certain arrays and patches functions that return pixel values (nice start, isn’t it?) and then the real frame decoding starts;
  2. Frame decoding is done like a state machine (which complements coroutines used elsewhere in the engine) with several tables for handling 256 opcodes (or less in some cases);
  3. In result you read byte, jump to one of the labels, perform the operation, read the next opcode, jump using the same or different table;
  4. Except when it’s opcodes 00-3F, then you usually have to construct length word, perform some pixel output loop and then jump to another opcode handler which performs some operation and then jump to the address calculated by the previous operation;
  5. Of course pixel functions are some permuted array of 256 pointers to the functions of three different kinds: return fixed value (set by the decoding function in the beginning), read byte and return pixel value from the corresponding array or read new pixel value from the stream;
  6. And to make it all even better, all those operations are obviously not actual functions but small(er) chunks of assembly code that use fixed registers as arguments and they’re located both before and after decoding function “body”.

Anyway, I’ve made some progress and I reckon it will be possible to support this format in NihAV though maybe not soon enough.

NihAV: even more Bink2 support!

Wednesday, March 13th, 2019

After managing to decode the first frame of KB2g variant I had three options: try to decode the other frames, try to decode other variants or do nothing. While the third option is the most appealing and the first option is the most logical, I stuck with the second one. So now I can decode the first frame of KB2f variant of Bink2 as well. Unfortunately the only (partial) KB2a sample I know is not supported, probably it’s a beta version that was tried on one game like Bink version b. Beside a small surprise in one place bitstream decoding was rather simple. Inter-frame support should not be that hard but it might get messy because of the DC and MV prediction.

And while talking about REing Bink I should mention that I’ve tried Ghidra while doing KB2f work. It is a nice tool that sucks in some places (not having a good highlight for variables, decompiling SIMD code results in very questionable output, the system being Java-based and requiring recent JDK—that’s the worst issue really) it works and produces decent results (including the decompiler). Also since it has 16-bit decompiler support maybe I’ll manage to figure out how those clips in Monty Python & the Quest for the Holy Grail are stored.

I should start documenting it too.

Insignificant update: okay, now it decodes inter-frame data correctly too and the only thing left is to make it reconstruct them correctly. Also I’ve updated codec information on Multimedia Wiki. Actually now it works quite okay so I’m not going to pursue it further. I have no real interest in Bink2 decoding after all.

NihAV: some Bink2 support

Sunday, March 10th, 2019

It took a long time but finally I can decode the first frame of Bink2 video (just KB2g flavour though but it’s a start).

At least the initial observations were correct: Bink2 codes data in 32×32 macroblocks, two codebooks for AC zero runs, one codebook for motion vector components, simple codes with unary prefix for the rest.

If you wonder why it took so long—that’s because I’m lazy and spend an hour or less a day on it. Also while the codec is simple in design it’s a bit complicated in implementation. While previous version related on format sub-version to decide which feature to use, Bink2 uses frame flags to decide which feature to use. For instance, flag 0x1000 signals that there are two bit arrays coded that tell when to read an additional flag during CBP decoding that tells which one of two codebooks should be used during AC decoding later. And flag 0x2000 essentially tells to use different bitstream decoding (like motion vectors decoding or block type decoding). Or the fact that it employs DC and MV prediction that usually has four cases (top-left macroblock, top block, left block, some block inside) plus WMV1-like handling of DC prediction in inter-frames (i.e. it calculated DC for inter blocks and uses them for prediction). And of course DC prediction for inter blocks works a bit different. Plus it tries to track internal state by packing all flags into 32-bit word and updating it for each block (two bits are for signalling top row, one—for macroblock being the leftmost one, some bits are copied from frame flags etc etc). So there’s a lot of nuances to take care of.

And that’s not counting the fact that current Bink2 player can’t decode versions prior to KB2g at all. Since I have some KB2f samples along with an old Bink player that can handle them, I guess I’ll support them eventually.

Bink2: Giving Up

Tuesday, June 17th, 2014

Short story: boring.

As I wrote more than a year ago the codec itself is rather simple and not that interesting (rather simple VLC coding plus DCT). Version KB2f was floating point, KB2g­—KB2i have switched to 16-bit integers wherever possible (including DCT, see links below). And I don’t really have content that I’d like to watch and unlikely to ever have it since my favourite games were released mostly before 2000 (that reminds me of Discworld Noir BMV format to RE…).

So here’s the list of what I’ve not done so far:

  • there are bugs in bitstream parsing, especially for KB2f;
  • I was too lazy to add all integer tables for newer versions, so it still uses floating point code for older version;
  • DC prediction (it uses median prediction with weird neighbours selection);
  • MV prediction;
  • motion compensation functions;
  • loop filtering;
  • definitely more of issues to resolve.

I’ll put known details to MultimediaWiki later.

All essential details about Bink 2g DCT are given here and here.

For those interested compressed patch is here.

P.S. I reserve right to look at newer versions if they appear and complain about them being equally boring.

Bink2 Again

Tuesday, May 13th, 2014

So in some previous posts I’ve described Bink2 bitstream. What’s the catch? It applies only to KB2f files, newer versions (KB2g­—KB2i) use different bitstream format:

  • Block types are coded as an index in LRU list with an unary code;
  • One quantiser for the whole macroblock instead of individual per-component quantisers;
  • New CBP coding method for luma (it depends on number of bits set in the previous CBP and such)
  • DCs are coded as unary codes + additional bits instead of fixed bits + additional bits
  • AC coding now has skips coded before coefficients (previously first coefficient was always coded).

Implementations welcome.

Bink2: Inter Block Residue

Saturday, January 18th, 2014

Inter block residue decoding is not different from intra block decoding except that DCs are expected to be in -1023…1023 range instead of 0…2047 and quantisation matrix for luma is different.

Posts about reconstruction process might follow.

Bink2: Intra Block — Chroma

Saturday, January 18th, 2014

Chroma block coding is similar to luma but with some changes since there are only 4 blocks coded here.

Thus, CBP is coded as two nibbles (real CBP and VLC switch) and it does not try to reuse nibbles from last CBP in code.

There are only 4 DCs here but they are decoded the same way.

AC block decoding is completely the same.

Bink2: Intra Block — Luma

Saturday, January 18th, 2014

Intra luma block in Bink2 contains the following elements: CBP, quantiser, DCs and ACs.

CBP is coded as 32-bit bitmask depending on the previous CBP value. Internally top half is coded depending on bottom one and the whole bitmask is coded in nibbles starting from LSB.
Lower half decoding depends on the control bits:

  • 11 — simply return last CBP
  • 10 — use low 16 bit from last CBP
  • 0 — decode 4 low nibbles of CBP. Initial nibble value is set to (last_cbp >> 4) & 0xF, if the next bit is 1 then don’t change it, otherwise read new value from the bitstream (4 bits of course).

Now we can use these low 16 bits of CBP to restore high 16 bits. This is also done by nibbles and decoding depends on them (why nibbles? Because blocks are coded in groups of four).


pat4 = (last_cbp >> 20) & 0xF;
ref = cbp;
for (i = 0; i < 4; i++) {   if (!ones_count[ref & 0xF]) {    pat4 = 0;   } else if (ones_count[ref & 0xF] || getbit()) {    pat4 = 0;    for (bit = 1; bit < 0x10; bit <<= 1)     if ((ref & bit) && getbit())      pat4 |= bit;   } else {    pat4 &= ref & 0xF;   }   cbp |= pat4 << 16 + i * 4;   ref >>= 4
}

Essentially it decides what set bits to copy from the first part. And top 16 bits are not really a coded block pattern, it just tells decoder to use an alternative set of VLC codes in AC decoding.

Quantiser is coded with static VLC (plus sign bit for nonzero value) as a difference to the previous quantiser.

Quantisation table for DC: 4 4 4 4 4 6 7 8 10 12 16 24 32 48 64 128

16 DCs (coded with the same scheme as motion vector described in the previous post)

16 blocks of AC coefficients coded in groups of four. Each AC block is coded as (value, skip) pair where value is coded with static VLC that gives small levels (0-3) or number of bits for raw value to read. Skips have one peculiarity too: value coded with static VLC defines either skip (for values 0-10), escape value (when you got you need to read 6 bits with real skip value), end of block value and that the following 8 AC coefficients won’t have skip values coded after them.

Scan order is strange, here are first 8 indices from it: 0, 2, 1, 8, 9, 17, 10, 16, 24, 3, 18, 25, 32, 11, 33