Revisiting MSVideo1 encoder

Recently somebody asked me a question about my MS Video 1 encoder (not the one in NihAV though) and I’ve decided to look if my current encoder can be improved, so I took Ghidra and went to read the binary specification.

Essentially it did what I expected: it understands quality only, for which it calculates thresholds for skip and fill blocks to be used immediately, clustering is done in the usual K-means way and the only curious trick is that it used luminance for that.

So I decided to use that idea for improving my own encoder. I ditched the generic median cut in favour of specially crafted clustering in two groups (I select the largest cluster axis—be it luma, red, green or blue—split 4 or 16 pixels into two groups by being above average or not and calculate the average of those two groups). This made encoding about two times faster. I’ve also fixed a bug with 8-colour blocks so now it encodes data properly (previously it would result in a distorted block). And of course I’ve finally made quality affect encoding process (also by generating thresholds, but with a different formula—unlike the original my encoder uses no floating-point maths anywhere).

Also I’ve added palette mode support. The idea is simple: internally I operate on pixel quads (red, green, blue, luma) so for palette mode I just need to replace an actual pixel value with the index of the most similar palette entry. For that task I reused one of the approaches from my palettiser (it should be faster than iterating over the whole palette every time). Of course the proper way would be to map colour first to have the proper distortion calculated (because the first suitable colour may be far from perfect) but I decided not to pursue this task further, even if it results in some badly-coded features sometimes. It’s still not a serious encoder after all.

Now this member of the early 90s video codecs ruling triumvirate should be good enough. Cinepak encoder is still rather primitive so I’ll have to re-check it. Indeo 3 encoder seems to produce buggy output on complex high-motion scenes (I suspect it’s related to the number of motion vectors exceeding the limit) but it will have to wait until I rewrite the decoder. And then hopefully more interesting experiments will happen.

7 Responses to “Revisiting MSVideo1 encoder”

  1. How does your cinepak encoder compare to the one in lavc? I think the main issue is the slowness in computing codebooks. I fixed the broken R-D logic recently so that it is actually usable, motivated by an email from an actual user!

  2. Kostya says:

    I haven’t tried comparing them but I’m pretty sure yours is superior in every aspect except maybe for the selection of vector quantisation method.

    Mine was written mostly to apply vector quantisation somewhere, I’m yet to improve it to make something tunable and flexible. Maybe after I fix Indeo 3 stuff…

  3. Paul says:

    The adpcm covox wiki entry about decoding procedure seems not fully correct, looking at older entry, as it was vandalized in its current state with dubious claims.

  4. Kostya says:

    I suspect it was a confusion caused by several formats for the thing that should not have any ADPCM in the first place. It would be really good if you clarify the things once and for all.

  5. Paul says:

    For 16bit formats ghidra is useless, and ida does not have decompiler so i’m lost.

  6. Kostya says:

    My experience is different but you need to know way more assembly language and segmented memory models in order to understand the code. And maybe older decompilers like REC can help you better there.

  7. Paul says:

    I use radare2, and its disasm match the uploaded .ASM file, and from there i see nowhere where i made mistake in C implementation.