I’ve more or less completed a basic failure of VP7 encoder. Now it can encode inter-frames using various size of motion compensation (and the resulting file can be decoded too!). There’s still a lot of work to be done (rate control, MB features and multiple frame analysis) but there are some things that I can talk about as well.
As I wrote in the previous post, there are too many coding parameters to try so if you want to have a reasonable encoding done in reasonable time you need to cut corners (or “employ heuristics” if you want to sound more scientific) in various ways. So here I want to present what has been done in my decoder to make it run fast.
First of all, there’s macroblock type selection. Instead of trying all types and picking the best one like I did in VP6 encoder, I process them in the following order stopping at any time when the result is good enough:
- at first I try inter macroblock using last frame as the reference (for intra frames only intra modes are tried, naturally);
- then it’s time for an inter macroblock using golden frame as the reference;
- if it’s complex I can try to split it and use four motion vectors (if not then I skip to intra prediction);
- if some of the 8×8 luma blocks show large difference I can try splitting them further using 4×4 motion blocks;
- then it’s time to try 16×16 intra prediction mode (and first I check if the luma part is flat—then I don’t even bother to try modes beside DC prediction);
- and finally 4×4 intra prediction modes can be tried;
Of course I terminate various searches early if the calculated macroblock metric is small enough so no further searches are necessary or if it exceeds the current best value.
The metric is calculated as the difference between coded and the original block plus the amount of nits multiplied by lambda (if you forgot what nits mean here, see my post on VP6 rate control). The proper way to do it would be to code block (or whatever) using all proper contexts, code the rest of data, calculate the proper probabilities and estimate how many actual bits were used to code the block. Sounds rather impossible, doesn’t it? That is why I simply re-use probabilities from the previous frame (or default ones when it’s intra frame). Also I don’t bother keeping track of the contexts required to code various blocks properly and simply use some arbitrary context values instead. You can go even further and instead of DCT→quant→(estimate block coding cost)→dequant→IDCT→(calculate coded and initial block difference and the final distortion value) pipeline you can probably use some heuristic to estimate coding cost and distortion value from the quantised block (I did that in VP6 encoder).
There are some optimisations done in motion estimation area. First of all I ditched brute force search for successive elimination algorithm (which uses the trick with block averages to skip testing knowingly bad positions; I’ll rant about not finding it earlier next time) and some EPZS method approximation where the smaller blocks are searched by testing the neighbouring motion vectors first and refining the search from the best candidate position.
And there is a lot of room for improvement in the parts that I still don’t know how to handle, namely selecting loop strength and sharpness and various quantiser offsets. Most likely I’ll not be able to find out how to do those things properly and my encoder will be a failure. Never mind, it’s going to be a failure anyway. The question is if I’ll be able to learn something useful from this experience.
speaking of the VP6 encoder, i would like to suggest something here.
so, about the whole “macroblock type selection” thing. you explained about how you’re going to optimize your VP7 encoder so that said selection goes out faster instead of going through all types and picking the best one as in the VP6 encoder.
is there any way an end user who’s just going to use nihav anyway (like me since i feel like the only one aside from you who actually uses it, though in my case it’s just to test the VP6 encoder anyway) to just “select” an “option” where VP6 encoding would be significantly sped up for as much as possible.
generally speaking, is there any way to make “faster VP6 encoding” an option in nihav?
sorry if this post comes off as incomprehensive, i do kinda use nihav rather regularly just for VP6 encoding stuff. it was (and still is) the main draw for me.
When I wrote VP6 encoder, speed was not my main concern (i.e. it should encode video in somewhat reasonable time but not as fast as possible). It was intended for me to learn and try some encoding concepts and not much more after all.
I guess it can be sped up a bit but that would require changing its design: in VP7 I try various block types (and terminate early) sequentially, in VP6 I first create the arrays with intra- and inter-coded MBs (golden frame as well, so it’s three arrays) and then try to select the best candidate (and try four-MV mode if intra is selected). The same applies to other parts of it: probably they can be sped up but with some design changes—and currently I’m trying that kind of design in VP7 encoder.
that’s… fairly interesting, to say the least.
that said, i’m looking forward to whatever you do with your VP7 encoder.