VP6 encoder design

This is the penultimate post in the series (there shall be another post, on how to use the encoder—but if there’s no interest I can simply skip it making this the last post in the series). As promised before, here I’ll present the layout and the details of my encoder.

First of all I’d like to mention the principles on which my encoder is built: simplicity and laziness. If something can be done in a simple way and works good enough then I don’t pursue it further. That is why it processes frames one by one without trying to look ahead to see if there’s some good partitioning scheme (in other words—which frames to code as intra or assign as a golden frame) or if something can be coded more efficiently if you have statistics for several frames. Encoder get a frame and encodes it, that’s all.

Second, it’s worth reminding that the encoder design is driven by codec design. VP6 uses a simple scheme with single quantiser per frame, up to two reference frames (previous and golden), the standard 8×8 DCT in 16×16 macroblocks and data encoded using fixed probabilities kept from the previous frame or partially updated. This makes encoder simple but you essentially you need to process full frame before you can encode it (unless you stick to the default probabilities) while with other codecs you can often encode blocks as you go without a need to know how the following macroblocks will be coded (of course it’s not the best compression mode but it’s useful for low latency encoding).

So, high-level encoding looks like this:

  1. encoder decides whether the current frame should be coded as intra or inter (by checking if it’s the first frame in a group of N frames);
  2. frame quantiser is either estimated by rate control (from the target frame size and estimated frame sizes with different quantisers) or it’s a fixed quantiser (or if there’s no fixed quantiser nor target bitrate, 42 is selected for obvious reasons);
  3. source frame data is read;
  4. for inter frames motion estimation is performed on the previously coded frame and on golden frame (unless the previous frame was golden frame as well);
  5. various kinds of macroblocks are prepared (with intra, inter 1MV, inter 4MV or golden frame data—depending on frame type of course);
  6. for inter frames there’s a macroblock type selection process;
  7. on prepared macroblock data DC and MV prediction is applied and the actual inter macroblock types are determined (e.g. inter MB with near MV or inter MB with golden frame reference and zero MV);
  8. prepared macroblocks are fed to the model estimator and optimal models for the frame are calculated;
  9. frame header is finally written out;
  10. models are encoded as updates (and only when coding an update saves bits, see this post for the details);
  11. the rest of the data (macroblock types, motion vectors, block coefficients) is encoded;
  12. current encoded frame is reconstructed so that the encoder uses the same previous frame reference as a decoder would (and it can be also assigned as a new golden frame reference as well);
  13. repeat for the next frame.

Internally encoder keeps this information during encoding:

  • models from the previous frame and current statistics;
  • source frame data (used to generate intra MBs and to calculate distortion);
  • intra MB data (transformed and quantised coefficients);
  • inter MB data (transformed and quantised coefficients, raw reference frame data, motion vectors);
  • inter MB data with four motion vectors per macroblock (the same format as above);
  • golden frame MB data (the same format as above);
  • macroblock types (used to select which transformed coefficients to code);
  • coded motion vectors (i.e. with predicted component removed from them).

Inter macroblocks keep reference frame data so it’s easy to reconstruct the coded macroblock (for both distortion calculation and coded frame reconstruction). During motion estimation phase when the best MV for the macroblock is determined, reference data is saved along with the residue and actual motion vector.

Macroblock type selection process is simple: rate control module keeps lambda value for RDO decision (see my previous post for more details) so the best between intra MB, inter MB referencing previous frame and inter MB referencing golden frame is selected based on their distortion and estimated cost (based on number of non-zero coefficients and quantiser value). Additionally if inter MB is selected and it has distortion over certain threshold, an approach with four motion vectors per macroblock is tried (this way I don’t calculate it for too complex or still macroblocks).

The other fine details are either the same as required for the decoder (like DC or MV prediction) or have been described in the previous posts.

I hope this demonstrates that writing a non-toy encoder is not as hard as you imagined. I did it mostly to play with the concepts and learn how it all works. Some determined guy can spend a bit more time on research and experimenting and make an even better encoder for some format.

8 Responses to “VP6 encoder design”

  1. eiken28 says:

    OK, what if i wanted to use your VP6 encoder? i’m definitely interested in it for sure.

  2. Kostya says:

    In theory you can simply compile nihav-encoder, feed it an input file (currently Y4M should work the best, maybe something else) and get an output AVI file (or Electronic Arts VP6 file, I have a patch for that already; or maybe FLV in the future).

    This can be done right now and if you can describe how and for what purpose you want to use my encoder then I can reflect that in a future “how to encode VP6” guide and maybe make some improvements to the overall code to make your use case easier.

  3. Paul says:

    Oh noes, yet another baidu encoder!

  4. eiken28 says:

    eh, personal stuff. i just want to see how an VP6 video looks like through a different encoder. i don’t want to rely on proprietary stuff. i also want to take advantage of the video quality that VP6 offers so the final video looks cleaner to my tastes.

    with that being said, i tried to transcode a video into a VP6 file using your encoder i just compiled, and found out that it doesn’t necessarily support other, uhh, lossless codecs, like other yuv variants beyond 4:2:0 (4:2:2. 4:4:4, etc) or RGB32.

  5. Kostya says:

    considering the complete lack of cooperation from Baidu on it (IIRC people asked and got reply that they’re not interested to work on releasing anything), I consider it to be Duck abandonware.

  6. Kostya says:

    Since VP6 does not support anything beside YUV 4:2:0 it’s not a big loss (and I guess for now the easiest way is to use Y4M as an input). I tried to ask in posts what format people would like for an input but got no replies so far.

    Something like nihav-encoder --input file.y4m --output vp6.avi -an --ostream0 encoder=vp6,quant=42 should work for your needs (alternatively you can set bitrate=300000 instead of quant=42 and see how rate control fails to keep it).

  7. eiken28 says:

    alright, the format i would like for an input would be an avi file containing uncompressed RGB32 video.
    as for the bitrate option i had no idea that even existed.

  8. Kostya says:

    I’ll see what I can do, meanwhile you can convert it to YUV4MPEG format using some tool (like FFmpeg) and perform the encoding.