Again, this should be laughably trivial for anybody familiar with that area but I since I lack mathematical skills to do it properly, here’s how I wrote forward DCT for inverse one.
First of all I’d like to mention that old versions of Bink use floating-point DCT implementation but since I did newer versions first, I ended up re-using the same integer implementation they use (as it should be almost identical).
Anyway, I have working IDCT, how to make DCT from it?
First of all I need to get its coefficients, which I do by feeding [0, 1000, 0, ... 0]
… [0, 0, ..., 0, 1000]
to the 1D IDCT thus obtaining its transform matrix (output coefficients are divided by 1000 for convenience):
1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.847 0.567 0.198 -0.198 -0.567 -0.847 -1.000 1.000 0.414 -0.414 -1.000 -1.000 -0.414 0.414 1.000 1.000 -0.235 -1.180 -0.668 0.668 1.180 0.235 -1.000 1.000 -1.000 -1.000 1.000 1.000 -1.000 -1.000 1.000 1.000 -1.767 0.352 1.495 -1.495 -0.352 1.767 -1.000 1.000 -2.415 2.415 -1.000 -1.000 2.415 -2.415 1.000 1.000 -2.848 4.262 -5.027 5.027 -4.262 2.848 -1.000
Then I inverse this matrix (using octave
but any solver should do that). Here it is, scaled by eight, rounded and transposed for clarity (somebody else would’ve recognized the DCT type and knew what matrix it would be without doing all that):
1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.925 1.631 1.090 0.383 -0.383 -1.090 -1.631 -1.925 1.707 0.707 -0.707 -1.707 -1.707 -0.707 0.707 1.707 1.382 -0.324 -1.631 -0.924 0.924 1.631 0.324 -1.382 1.000 -1.000 -1.000 1.000 1.000 -1.000 -1.000 1.000 0.617 -1.090 0.217 0.924 -0.924 -0.217 1.090 -0.617 0.293 -0.707 0.707 -0.293 -0.293 0.707 -0.707 0.293 0.076 -0.217 0.324 -0.383 0.383 -0.324 0.217 -0.076
Each row in the matrix corresponds to the weights for input coefficients to apply to get the output coefficients. I.e. DC coefficient is the sum of all input coefficients (as expected), fourth output coefficient is sum of input coefficients 0, 3, 4 and 7 minus sum of all other coefficients, sixth output coefficient is the sum of input coefficients that use 1±√½ weights (with different signs) and so on.
Then you notice symmetry (i.e. it’s always coefficient 0 plus-minus coefficient 7, coefficient 1 plus-minus coefficient 6 and so on) so you can work with those sums/differences and reuse them for the calculations of different coefficients again. Also calculating output coefficients 2 and 6 can be done by multiplying some sums/differences by √½ and adding them to unmodified sums/differences.
After outputting with those there are only odd output coefficients left to deal with:
1.925 1.631 1.090 0.383 -0.383 -1.090 -1.631 -1.925 1.382 -0.324 -1.631 -0.924 0.924 1.631 0.324 -1.382 0.617 -1.090 0.217 0.924 -0.924 -0.217 1.090 -0.617 0.076 -0.217 0.324 -0.383 0.383 -0.324 0.217 -0.076
There you may also notice a relation between 1.925, 0.076, -0.924 and 0.924. Yes, it’s also essentially just one coefficient plus-minus one (they’re slightly different here because of the rounding). The same is true for 1.382, 0.617, 0.383 and -0.383. All is left is ±1.631, ±1.090, ±0.324 and ±0.217. Somebody with relevant knowledge might say that employing a single rotation here will halve the number of constants but I’ve never learned such details. In either case it’s decently fast (especially compared to the matrix multiplication).
So, now we have working 1D DCT, it should be made into fixed-point one by selecting proper precision. Since Bink IDCT uses 12-bit constants I used the same and added 6 and 7 bits of precision at each DCT stage (since Bink uses 13-bit quantisers). Maybe the result is not great, it still seems to work reasonably well.
Next time I’m going to tell how block coefficients are actually coded. It’s not a very complicated scheme but it’s very original.