While there’s nothing much to write about the encoder itself (it should be released and forgotten soon), it’s worth recording down how some magic numbers in the code (those not coming from the specification) were obtained. Spoiler: it’s mostly statistics and gut feeling.
Yes, essentially all magic numbers are derived as “generate a lot of statistics during encoding process and pick a value or two that feel right”. For example, thresholds for stopping motion search or macroblock type search were derived after I looked at the thresholds for various macroblocks, said to myself “those values seem to be common for good blocks, so let’s round it up to a power of two and use as a threshold”.
But for most of the cases I had to encode videos of different types at all quantisers, collect statistics (usually with an AWK script) and make some decisions on that. For example, for guesstimating a number of bits an encoded block will take I collected statistics on how many bits on average it takes to encode a block of certain type with given amount of non-zero coefficients, made it into a table, filled some gaps and use it for predictions. It seems to work reasonably well.
Similarly the question of deciding which codebook superset to use was decided after seeing the statistics on how many bits on average it took to encode a block with certain amount of non-zero coefficients. So one superset is good for blocks with low amount of NZ coefficients, another one is good when you have blocks with low and high number of zeroes, the third one is in-between. So I ended up calculating cumulative distribution of NZ coefficients per block and selecting the superset depending on which number of NZ coefficients belongs to the third quartile.
With rate control it’s a bit trickier. You have to estimate frame size from its complexity and type. So again, I encoded the usual test set with different quantisers, got average size-to-complexity ratios for each quantiser and frame type and fed it to octave
. It turned out that an incantation like polyfit(linspace(0, 31, 32), ratios, 2)
gives coefficients for a quadratic approximation of those ratios (of course you can calculate cubic or any other degree of polynomial approximation as well). Maybe it’s not perfect but it’s good enough for my needs (especially with some rounding to have shorter coefficients in the formula).
Of course this is far from the best way to do it (and probably even far from a good practical way as well) but it’s simple enough to be implemented without spending too much time on research. As I said before, my goal is to learn how it’s done and not to compete with something else. NihAV
is not supposed to be useful after all (even if it comes in handy occasionally).