VP6 — simple interframe encoder done

As I said in the previous post detailing the roadmap, there’s a lot to do for an interframe encoder. Now I have the basics implemented but there’s a lot more to do.

What I done is a simple encoder that performs very limited motion search, makes a simple decision on which block to take (intra or inter) and encodes it all using custom probabilities. Let’s see how it was done:

  • custom macroblock type encoding essentially requires transmitting a table with probabilities for MB type appearing after the same or different MB type. So I had to count the times one MB type appeared after another, sum it up in a table and normalise so its sum does not exceed 256. Then I re-calculated the trees to see how many bits coding all macroblock types would take and if it’s worth bothering with custom encoding. But that’s not all, those probabilities are stored as an entry in a vector codebook plus optional differences coding. In my test on a very small file encoding MB types with a selected vector instead of default values shaved off a significant amount of bits from it. Encoding deltas shaved off 2-5 bits but it costs 50-60 bits to encode those deltas so it should not be used too often, I suppose;
  • motion search is done by performing an exhaustive full-pel search by three pixels in each direction. For now I simply reuse MC routines from my VP6 decoder to get properly filtered (and interpolated in the future) block but I want to create a fully filtered frame from the start and use it for the searching (or rather three filtered frames because VP6 filters edges only if motion vector component is not a multiple of 8 pixels). Also there’s a question of faster motion search, I’ll dedicate a separate post to it;
  • deciding when to use intra or inter macroblock is a complex problem that should be covered in a post about rate distortion optimisation if I ever get to it. On one hand you have two differently coded blocks with different amount of bits required to code it and different error between coded and original block (that’s what RDO is supposed to solve), on the other hand because of inter-dependencies between blocks you never know how many bits it will take to encode the current block until you calculate the probabilities for coding coefficients and zero runs for all blocks. I guess it’s better to rely on heuristics that will predict how many bits it will take to code block with this many coefficients but I need to gather encoding statistics first in order to derive such rule. Meanwhile I use a very simple rule: if inter block has less distortion or less coded coefficients then pick it, otherwise code it as intra block. This seems to work fine so far.

Now what’s left to do:

  1. fast motion search, as mentioned above;
  2. four motion vectors per macroblock mode (which relies on having fast motion search first; otherwise it’d be too slow to test);
  3. golden frame source support (same requirement as the above for the same reason);
  4. golden frame selection (I’ll probably stick to some simple rule like if frame differs enough from the previous one—either by more than half of macroblock being intra or by overall motion difference being too high—then it should be selected for a gold frame);
  5. and of course selecting frame quantiser (which should be described in a post about rate control);
  6. maybe also Huffman-coded block coefficients (this should be easy but I’m not sure yet it’s worth bothering with).

This should keep me occupied for a long time (but not necessarily with the encoder).

7 Responses to “VP6 — simple interframe encoder done”

  1. Attila says:

    Hi! I can’t believe you are working on this _right now_, and are making such good progress!
    You see, I’m working on Ruffle, and within that, trying to add VP6 video support.
    Now, there is a perfectly fine decoder in FFmpeg, but I ran into some unknowns (namely, where are the chroma samples relative to the luma samples). And to find this out, I wanted to create some simple test videos.
    But all I could find was https://github.com/avivklas/vp6-encoder – which “worked”, but produced VP62 video, which might not be embeddable into an SWF as is… And it’s not the most convenient or flexible solution either.
    So, is there any chance the encoder you are working on will be available for us for testing purposes? Or perhaps even open source?
    Thanks!

  2. Kostya says:

    Of course it will be open-source and I even want to make a separate page describing how to build and use nihav-encoder with my VP6 encoder (there was some interest for it from Red Alert modding community). Though currently NihAV lacks on supporting raw or lossless packed video to feed to the encoder (suggestions for a codec or format are welcome).

    BTW VP6 specification (that you can find on wiki.multimedia.cx) lacks that information about chroma sample positions.

  3. Attila says:

    That’s awesome, thank you so much!

    Yeah, I looked everywhere for this tiny piece of info about sample positions, but no mention of it anywhere… So only experientation and observation remains, I guess!

  4. Attila says:

    And is your encoder already available somewhere for trying perhaps?
    Right now I’m only trying to encode some single-frame 16×16 (or similarly sized) test videos, and based on what you wrote, this project should already be more than capable to do this.

  5. Kostya says:

    Not yet, maybe I’ll try to make something for you tomorrow but that’s not guaranteed.

  6. Attila says:

    Whoa, guaranteed or not, if “tomorrow” is even an option, that sounds super soon!

  7. Attila says:

    And to you, reader from the far future: If, for whatever reason not yet known, this project does not work out for you, this is the roundabout way I could manage to put a single-frame VP6 video into an SWF from a PNG:

    > docker run -v $PWD:/videos avivklas/vp6-encoder -q 5 -i /videos/testframe.png -o /videos/testvideo.avi
    > ffmpeg -i testvideo.avi -vcodec copy testvideo.flv
    > ffmpeg -i testvideo.flv -vcodec copy -r 25 testvideo.swf

    testframe.png is supposed to be in the current directory.

    Might work for videos as well, not just single frames, and in that case, the “-r 25” argument might not be needed.