VP7 encoder: various bits « Kostya's Boring Codec World

VP7 encoder: various bits

As the world tries to avert attention from an insane dictator re-enacting 1939 (it gets funnier since I observe it from Germany), I should also do something to take my mind off constant worrying about my parents and other relatives in one of the Ukrainian cities under attack. Hence this significantly less unpleasant thing.

Now my encoder is conceptually done, all that is left to do is to fix a leftover bug or two, improve a thing of two, clean the code up and integrate it nicely with the rest of nihav-duck crate by splitting off common parts with VP6 encoder. Meanwhile I can talk about some things implemented since the last time and what wasn’t.

So what has not been done:

trellis quantisation search—this is not a very good codec for that kind of things after all (and libavcodec decoder does not like macroblock features anyway). I should still probably write about it in a separate post though;
better heuristics for estimating coded block size—I should probably implement proper block context prediction though and maybe add a special case for empty blocks;
better heuristics for macroblock mode selection—by a quick estimation my VP7 encoder is about 2.5 times slower than my VP6 encoder and 40% of the time is spent in motion compensation (used to reconstruct the macroblock as it will look in the decoder but mainly it’s used in motion estimation);
proper EPZS implementation that takes motion vectors from previous frame into account;
proper MB tree implementation;
automatic best intra or golden frame selection.

So what has been done since the last time?

There’s rudimentary rate control now based on a simple ides: you maintain bit budget that is increased every second of processed video and decreased with every frame encoded. From this budget a part of it is allocated for the single frame (one unit for intra frame, 0.85 of the unit for intra frame) and a quantiser is estimated from the amount of nits required to code one macroblock (in intra or in inter frame). The quantiser selection was done in a simple way: I encoded a bunch of test samples with different size and content type with a fixed quantiser, calculated an average amount of nits required to code a frame and approximated it with a two-piece function—low quantisers are using quadratic approximation (obtained by feeding quantisers to polyfit function in Octave and rounding them a bit) and for other quantisers it’s a logarithmic function in the form of N - a*ln(quant - b) that I found by trial and error (and looking what seemed close enough to a plot of empirical values). And for the cases when the prediction does not work good enough there’s lambda parameter to adjust.

Then there’s loop filter strength selection. I ran some experiments by encoding a frame with different quantisers and then reconstructing it using different loop filter strength and sharpness and measuring the squared error. It turned out that loop filter sharpness is better left at zero and loop filter strength should be around quant/20.

And finally there’s MB tree parody implemented mostly because it’s fashionable to have MB tree feature in your video encoder and to have an excuse to try alternative quantisers. The idea behind MB tree is very simple: if a macroblock is referred many times in the subsequent frames (directly or by referring to a macroblock that refers to the original one) then it must be important and should be coded with better quality. I implemented it for I-frames by buffering a couple of frames and calculating a macroblock map. So for I-frame each macroblock refers to itself and has weight equal to one, for each following frame a macroblock-level motion vector is searched and then the reference macroblock may be updated and the original I-frame macroblock counter is increased. Then all macroblocks with more than average references to them are coded with decreased quantiser (using confusingly named macroblock feature feature for that). This also allowed me to test asynchronous encoding i.e. one where frames are not coded immediately but may be buffered first.

All in all this could’ve gone much better but there’s no limit to perfection anyway. As I mentioned in the beginning there are some things left to polish and I’ll try to see if any of those ideas are worth applying to VP6 encoder as well.

This entry was posted on Sunday, February 27th, 2022 at 1:24 pm and is filed under vp7encfail. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.