I’ve finally finished implementing the rest of the features required for interframes: motion estimation, previous or golden frame selection (along with golden frame itself), four motion vectors per macroblock are finally supported. How I implemented fast motion search deserves a separate post that I hope to write at the weekend, the rest of things should be in this post.
First of all I’d like to say that the idea with having three pre-filtered images for faster search did not work out—because of the very specific way VP3-6 work, they take 12×12 block from the reference frame and filter only one horizontal and one vertical edge there (depending on source position it may be just one of the edges or none at all). In result you either need an outrageously large collection of the filtered frames (with even or odd horizontal and vertical edges filtered or not) or simply ignore that and use fully (un)filtered frame as an approximate source and hope for the best. I simply decided to ignore that all and keep using the block interpolation function from the decoder. This may be slower but at least it works as it should. I wonder if it was one of the reasons why previous attempts on writing an alternative VP6 encoder failed.
Anyway, by forcing various decisions I managed to test an inter-frame consisting of various macroblock types: intra macroblocks, inter macroblocks using previous frame as the reference with one or four motion vectors, inter macroblocks using golden frame reference. So far I’ve used simple rules for the decisions: if inter MB has less distortion or fewer non-zero coefficients than intra one, pick it; if its difference from the source MB is too large, try 4MB macroblock to see if it improves distortion and has even fewer non-zero coefficients; or maybe golden frame reference MB would be better than inter. Similarly golden frame will be selected when the frame is coded with more intra MBs than of any other kind (and if there’s more than 75% of them I’ll probably force intra frame coding).
The main things left to do are having proper decisions for selecting one macroblock type over another and for that I need to experiment with rate distortion optimisation and more importantly, introduce some heuristic to estimate the coded block size. Then I should introduce some rate control instead of a fixing quantiser (estimating encoded data size will come in handy there as well). And maybe finally write Huffman-based block coefficient coding but that one should be rather trivial.
I guess the main thing I need to do now is to collect statistics for different encoded block types (inter/intra, different quantisers and number of non-zero coefficients), make some formula out of it and see how it all fares. This will probably be the most boring part of the encoder development but at least it should not take a lot of my attention while it collects the data.