Somewhat unrelated update: I’ve managed to verify that the output of my decoder works in the Heroes of Might and Magic III properly even with sound after I fiddled with the container flags. The only annoyance is that because of DCT discrepancies sometimes there are artefacts in the form of white or black dots but who would care about that?
At last let’s talk about the one of the most original things in the Bink Video format (and considering the rest of the things it has, that’s saying something).
While most DCT-based codecs use zero-run form of coding (even the modern ones), Bink Video uses tree-based coding instead. Here is how the tree for DCT coefficients looks like (I’ve used nested list instead of a drawing for the lack of the drawing skills):
- node for coefficients 4-23:
- node for coefficients 4-7:
- coefficient 4
- coefficient 5
- coefficient 6
- coefficient 7
- node for coefficients 8-23:
- node for coefficients 8-11:
- coefficient 8
- coefficient 9
- coefficient 10
- coefficient 11
- node for coefficients 12-15:
- …
- node for coefficients 16-19:
- …
- node for coefficients 20-23:
- …
- node for coefficients 8-11:
- node for coefficients 4-7:
- node for coefficients 24-43:
- (same structure as above)
- node for coefficients 44-63:
- (same structure as for coefficients 4-23)
- coefficient 1
- coefficient 2
- coefficient 3
The difference between lossless residue tree and DCT coefficients tree is that the former has a node for coefficients 0-3 at the end instead of three explicit coefficients. In either case the coefficient number here is in scan order (which differs from the usual zigzag as one could reasonably expect by now).
So, what is done with that structure? For DCT coefficients the number of bits to code the maximum absolute value is transmitted and then for each decreasing value a pass through the tree is made. During that pass the list of nodes is traversed, unexpanded nodes require a bit flag to mark whether they should be expanded or skipped, expanded nodes are fully skipped, leaves also have use/skip bit. For the performance reasons the state is stored as a list, expanded node is replaced by its first child node and the rest of them are appended at the end of the list; leaves are prepended to the beginning of the list (except for those that are to be used immediately).
Here is an example to help understand it better. Suppose we have 4-bit coefficient #2 and 2-bit coefficients #24 and #28.
- transmit 4 bits as the maximum coefficient size;
- form the list:
(node 4-23) (node 24-43) (node 44-63) (coef 1) (coef 2) (coef 3)
- set current depth to 4 and iterate the list:
- transmit zero bits for the first four entries as we ignore them;
- transmit set bit to signal that we code coefficient #2, its absolute value and sign;
- remove that entry from the list (or mark as removed);
- transmit zero bit for the last entry in the list.
- set current depth to 3 and iterate the list:
- transmit zero bits as we do not do anything at this level;
- set current depth to 2 and iterate the list:
- transmit zero bit to skip the first node;
- transmit set bit to expand the second node;
- list now looks like
(node 4-23) (node 28-43) (node 44-63) (coef 1) (deleted) (coef 3)
plus coefficients 24-27 we need to transmit or add to the list; - transmit zero bit for coefficient #24, write its value and sign;
- transmit set bits for coefficients 25-27 to be queued;
- list now looks like
(coef 27) (coef 26) (coef 25) (node 4-23) (node 28-43) (node 44-63) (coef 1) (deleted) (coef 3)
; - the current node still needs to be expanded, send a bit for that;
- list now looks like
(coef 27) (coef 26) (coef 25) (node 4-23) (node 28-31) (node 44-63) (coef 1) (deleted) (coef 3) (node 32-35) (node 36-39) (node 40-43)
; - the current node still needs to be expanded, send a bit for that;
- since it’s a node for four values, mark it as deleted and iterate over coefficients;
- transmit zero bit for coefficient #28, write its value and sign;
- since all coefficients are coded, end the operation.
Of course the DC value as well as the number of coded coefficients is transmitted in a separate stream.
The lossless residue coding differs a bit: instead of full coefficients the residue values are transmitted as masks (e.g. value 9 is bits 4 and 1 set, which means two masks to be transmitted). As the result before traversing the tree at each bitdepth decoder has to look up through the list of already present values and decode bits for them. So e.g. if we have value -9 then at bitdepth 3 it will be decoded as -8 (mask present plus sign bit), at bitdepths 2 and 1 zero bits will be read for it before traversing the tree and at bitdepth 0 the least significant bit will be read (so -8 – 0*4 – 0*2 – 1*1 = -9). Similar to the DCT coefficients decoding, as soon as we know that there will be no more set bits the decoding process stops.
As you can see, this approach is rather unorthodox (and would be more at home in wavelet-based formats) and I fully understood how it works only during adding support for residue/DCT blocks in my (second) Bink-b encoder. NihAV
is about learning stuff and learning stuff I did.
Now we’re done with Bink Video and it’s time to document Bink Audio.