Finally NihAV
got full-feature VP7 decoding support (well, except one very exotic case for a very exotic mode) so now I can move to other things like actually making various decoders bit-exact, fixing other bugs in them, adding missing pieces of code for player and even documenting stuff. I hope to give a presentation of my work on VDD 2020 or FOSDEM 2021 (whichever accepts it) and I want to have something decent to present by then.
Anyway, here’s a review of VP7.
First of all, I’d like to note complete lack of cooperation from Baidu. Back in the day when they just bought On2 and presented VP8 people asked them to release VP7 as open source. To which they replied that their developers are working on adding new features to the codec and they don’t have time to locate the old sources. Even better, we have managed to find VP6 and VP7 specification on some Chinese site and Baidu people could not even bother to release that. As it was said in certain series of games, press <F>+<U>
to pay respect.
Well, maybe they were too ashamed of it because it looks like half-finished VP8 specification without source code in the appendix. It has the same passages written by Captain Obvious like this passage:
A reasonable “divide and conquer” approach to implementation of a decoder is to begin by decoding streams composed exclusively of key frames. After that works reliably, interframe handling can be added more easily than if complete functionality were attempted immediately.
And very informative things like:
const Prob kfUVmodeProb [numUVmodes - 1] = { ??, ??, ??};
Nevertheless, it was my primary source along with the binary specification which obviously has all those missing tables, pieces of code and more.
So let’s move to overall VP7 description. VP7 is Duck rip-off of H.264: the same coding method as used in VP5 and VP6 (to the point where I could reuse bits for decoding coefficients and motion vectors from my VP5/6 decoder) with overall structure of H.264 (4×4 blocks clustered into 16×16 macroblocks, optional separate 4×4 block for luma DCs, spatial prediction and such). It’s funny that if you consider how it does certain things then VP8 looks like a very minor improvement on VP7 (fixed probabilities are internally calculated from the fixed tree weights, golden frame can be partially updated after decoding a frame and other details). Another fun fact is that VP7 decoder recognizes three FOURCCs—VP70
, VP71
and ONYX
(and you can still see onyxc_int.h
in libaom
). VP71
is claimed to be error-resilient profile and it has some small changes in frame header and it can signal to preserve scan order and some frequencies (that cannot be changed regardless) and restore them after the frame has been decoded.
The nastiest part there is so-called macroblock features: if they are enabled then each macroblock can have something in its decoding process altered. First feature forces decoder to use different quantiser for current macroblock, second one tells decoder to use different loop filter strength, third one tells decoder to update golden frame with this macroblock and the last feature tells decoder to reconstruct macroblock in weird way. There are two parts there: residue reconstruction (i.e. how to apply reconstructed 4×4 blocks to the macroblock) and motion prediction. Residue reconstruction has four modes: normal, interlaced, very interlaced (i.e. using every fourth line instead every second one) and koda extremely interlaced (each 4×4 subblock is treated as 16×1 line instead). Motion prediction has all these modes and also warped motion (when you copy not from a rectangle but rather from a parallelogram with 45-degree angle). I can understand many things but “copy from a line like it’s a 4×4 block” is too much for me and thus I left it unimplemented (not that I have any samples for it either).
But other than that I have all features implemented (even though I don’t have samples to test them) and decoder produces recognizable picture so I’m happy with that and can move to other things as I stated in the beginning.
And in the unlikely case somebody reads this post and asks that question—no, I’m not going to implement VP8/VP9/AV1 decoder. Those are not Duck codecs and I’d rather reverse engineer some obscure codec instead.
I do not have time to read full post. Keep REing all obscure codecs except sheervideo!
I’ve REd SheerVideo once, that should be enough. But there may be still some third-party QuickTime codecs left to RE. That means I’ll have to add MOV demuxer eventually as well.
Yes, media101 codec and BeHereiVideo
All things must come to an end. Great job. Looking forward to you talk.