Archive for the ‘Various Video Codecs’ Category

Revisiting MSVideo1 encoder

Wednesday, February 8th, 2023

Recently somebody asked me a question about my MS Video 1 encoder (not the one in NihAV though) and I’ve decided to look if my current encoder can be improved, so I took Ghidra and went to read the binary specification.

Essentially it did what I expected: it understands quality only, for which it calculates thresholds for skip and fill blocks to be used immediately, clustering is done in the usual K-means way and the only curious trick is that it used luminance for that.

So I decided to use that idea for improving my own encoder. I ditched the generic median cut in favour of specially crafted clustering in two groups (I select the largest cluster axis—be it luma, red, green or blue—split 4 or 16 pixels into two groups by being above average or not and calculate the average of those two groups). This made encoding about two times faster. I’ve also fixed a bug with 8-colour blocks so now it encodes data properly (previously it would result in a distorted block). And of course I’ve finally made quality affect encoding process (also by generating thresholds, but with a different formula—unlike the original my encoder uses no floating-point maths anywhere).

Also I’ve added palette mode support. The idea is simple: internally I operate on pixel quads (red, green, blue, luma) so for palette mode I just need to replace an actual pixel value with the index of the most similar palette entry. For that task I reused one of the approaches from my palettiser (it should be faster than iterating over the whole palette every time). Of course the proper way would be to map colour first to have the proper distortion calculated (because the first suitable colour may be far from perfect) but I decided not to pursue this task further, even if it results in some badly-coded features sometimes. It’s still not a serious encoder after all.

Now this member of the early 90s video codecs ruling triumvirate should be good enough. Cinepak encoder is still rather primitive so I’ll have to re-check it. Indeo 3 encoder seems to produce buggy output on complex high-motion scenes (I suspect it’s related to the number of motion vectors exceeding the limit) but it will have to wait until I rewrite the decoder. And then hopefully more interesting experiments will happen.

Indeo 3 encoder: done

Monday, February 6th, 2023

I’ve done what I wanted with the encoder, it seems to work and so I declare it to be finished. It can encode videos that other decoders can decode, it has some adjustable options and even a semblance of rate control.

Of course I’ll return to it if I ever use it and find some bugs but for now I’ll move to other things. For instance, Indeo 3 decoder needs to be rewritten now that I understand the codec better. Also I have some ideas for improving MS Video 1 encoder. And there’s TrueMotion 1 that I wanted to take a stab at. And there are some non-encoder things as well.

There’s a lot of stuff to keep me occupied (provided that I actually get myself occupied with it in the first place).

Indeo 3: codebooks

Saturday, February 4th, 2023

As you probably remember, Indeo 3 has 21 codebook. In theory you’d expect them to correspond to coarser quantisers, in reality it’s not that easy. For starters, codebooks 8-15 trigger requantisation of the reference, i.e. in intra mode the top line used for prediction is replaced with coarser values. Yes, it really modifies previously decoded data. And for inter mode it does the same on the previous frame for the first line of the reference block. I’ve decided to enable codebooks 8-15 only for intra mode and not even attempt to use codebooks 16-20 at all. So, what can I achieve with those?

I’ve started experimenting with rate control so I encoded various kinds of samples (albeit small and short) and here are the results:

  • codebook set 0-7 and 8-15 give about the same frame sizes (i.e. it does not matter if you take e.g. codebook 2 or 10);
  • an average intra frame size decreases with codebook number but with inter frames some codebooks result in larger frames (sometimes codebook 2 resulted in larger P-frames than with any other codebook but codebook 6; in other case codebook 5 gave the smallest frames);
  • not forcing a codebook noticeably improves compression of P-frames compared to always using codebook 0 and has almost no effect on I-frames;
  • I-frame to P-frame size ratio varies greatly on the content: for realistic content with a lot of changes it is about 1:1, for videos with low motion and changes it can get to 1:3 or even more.

Maybe the compression ratio can be improved by fiddling with the (completely arbitrary) thresholds I use for some decisions (e.g. if the cell should be coded or marked as skipped). I’ve made them options so all zero people who want to play with it should be able to do that.

So far I think I’ll make rate control in a simple manner: all frames will be treated as potentially of equal size, codebook number will be adjusted depending on the expected and obtained frame sizes and if it overshoots I’ll try to re-encode it with a neighbouring codebook (as this may change frame size drastically).

I’ll write about the results when I have them.

So, are video codecs really done?

Friday, February 3rd, 2023

Yesterday that Derek’s talk at Demuxed got to me for about the fourth time and I was asked about my opinion on it as well. I can take the hint (eventually), so here’s what I think.

Unlike Derek I’m a major nobody with some interest on how the codecs are working, to the point that I’m not afraid to look at their binary specification and sometimes even implement a decoder. Anyway, I’ll try to give a short summary of the points he presents and what I think about it.
(more…)

Indeo 3: cell coding

Monday, January 30th, 2023

So we partitioned out the frame and now have to code the cell data. How to pick the best parameters in this case?

The patent suggest calculating vertical and horizontal differences (i.e. differences between top-bottom and left-right neighbours) and depending on how large they are select one of the modes. Codebook selection is not reviewed at all. The reference encoder calculates those differences and uses them to set both cell coding mode and select codebook. I.e. if both differences are large use mode 0 (fine-grained coding), if only one difference is large use mode 3 or 11, otherwise use mode 10. And a ratio of the differences is clipped, multiplied by a magic factor, then by a rate control factor and used as an index in a special magic table to select codebook.

Since my goal is to learn something new instead of simply replicating something existing, I took a completely different approach (that should contain less magic). Mode selection is done by comparing differences and amending it if I decide to use two codebooks. I used the fact that first eight codebooks mostly have differences in form kN+1 and the next eight codebooks have differences in form kM. So I simply calculate for each codebook how many delta values are represented best with those formulas and select the best fitting one. Also I calculate it separately for the even and odd lines (the histograms can be merged later to give a total statistics) so I can select the appropriate codebook or codebook pair for the coding mode. Maybe I’ll have to adjust the scheme for the rate control but it’ll happen later. Side note: Indeo 3 specifies a per-frame set of 16 codebook pairs that all cells should use and global codebook index offset so single-codebook modes may use additional 5 codebooks; the set seems to be static and has regular structure and I’m not sure that global codebook index offset is ever used.

That’s it. The rest of the things should be rather trivial: I’ve written how to perform motion search before, rate/quality control has never been great in the original codec (maybe I’ll report how I did it when I get to it), zero run compression is nothing special either. There’s not much to write until I fix some bugs, improve compression, introduce rate control and validate it against the reference decoder. And that will take a long time…

Indeo 3: splitting the frame

Saturday, January 28th, 2023

As mentioned in the previous post, Indeo 3 splits frame into cells using binary trees and they’re coded using one of several possible modes. In reality it’s more complex: there’s a primary tree that splits frame into regions and tells how to code them (intra or inter) and those regions themselves can be split using another binary tree to tell which coding method to use (or to skip decoding it entirely). See, it had tree coding, prediction units and coding units two decades before H.265! And slices as well: it divides data into strips 160 pixels wide too.

Splitting the frame optimally is practically impossible task (because of its combinatorial complexity). In reality though it’s much simpler: first we split plane into 160-pixel wide (or 40-pixel wide for chroma) strips then split them along the largest dimension until we get cells of maximum acceptable size (which seems to be 767 pixels but the encoder seems to handle up to 2048 pixels in a coded cell). Then it’s up to a secondary cell coding.

From what I could gather in the encoder, it also tries to split secondary cells if they’re above the limit but it’s the same value used in the reference encoder even if it could be set separately.

Since my goal is to learn something new instead of re-creating something existing, I use a different approach: initial mode is selected by the relation between horizontal and vertical differences (if both are too high I try to split the cell and try again). Similarly for inter mode I first try to see whether the cell can be coded as inter (and if splitting it will make at least one of the sub-cells code as inter) and if not then I resort to intra coding.

There is probably a better way than brute force to find out the optimal splitting but for lack of it a simple heuristic should do.

Cell coding mode and codebook selection is a topic best left for the next time.

Indeo 3 overview

Friday, January 27th, 2023

The overall idea behind this codec is simple: a frame is split into cells of variable size (the patent says “a roughly regular grid of cells”) using a binary tree, each cell can then either be coded in intra mode (differences to the previous line) or inter mode (differences to some region in the previous frame). Coding is done by splitting cell into 4×4, 4×8, 8×8 or 8×4 blocks and using one or two of 21 codebook to code pairs of differences (with some tricks to compress small differences and zero runs even further).

The patent describes thirteen different modes, the decoders I know about support only some of those:

  • mode 0—code 4×4 blocks using a single codebook;
  • mode 1—code 4×4 blocks using two different codebooks for even and odd lines;
  • mode 2—code 4×4 blocks using two different codebooks but the second one is used only for the second line (no known decoder supports that mode);
  • mode 3—code 4×8 block using a single codebook by coding differences to the even lines and interpolating the odd ones;
  • mode 4—the same as mode 3 but with two codebooks;
  • mode 5—very similar to mode 3 but with a possibility to add a correction to the interpolated lines (since it involves writing single bits that no other part of the codec does, no known implementation supports it);
  • mode 6—like mode 5 but with two codebooks (and equally unsupported by anything known);
  • mode 7—code 4×4 blocks with bit flags for telling which dyad to code (no known decoder supports this);
  • mode 8—the same as mode 7 but with two codebooks (of course it’s unsupported);
  • mode 9—the same as mode 7 but with the second codebook specially for the second line (equally unsupported);
  • mode 10—code 8×8 block using a single codebook by either duplicating pixels on even lines and interpolating odd lines (for intra) or scaling each delta for 2×2 block (in inter mode);
  • mode 11—code 4×8 (inter only) block using corrector repeated for each odd line;
  • mode 12—mode 11 with two codebooks (only VfW version supports it).

Considering the internal implementation details (e.g. using arrays for opcode handling or not), I’d say that QuickTime and XAnim versions of the decoder are based on the same porting kit code supplied by Intel while Video for Windows version uses the different codebase (it’s not just an encoder being present and mode 12 support, it’s also the way how many tables are generated in runtime while they are static in other decoders, not using the opcode tables and other minor things).

But before we start to code cells we need to perform the initial frame splitting and the next post should be about that.

Starting yet another useless encoder

Thursday, January 26th, 2023

Even before I started to write my series of posts on FFhistory, I had another work in progress already which I’m now making public in order not to chicken out (as I did several times already). I’m talking about Indeo 3 encoder.

Why Indeo 3 of all possible things? It’s both not your conventional DCT-based codec and it’s widespread enough to be of some limited use for me (being present in AVI, MOV and VMD containers, only Cinepak is more ubiquitous). I’m not as good as Mike Melanson but I’m willing to try my hoof at it.

The funny thing is, there’s an opensource decoder for it and even a decent description in US patent 5,386,232 from 1995 (so it’s expired already and anybody can write an encoder for it). The problem is that those two sources don’t match between each other and somewhat disagree with the official binary specification (I’m pretty sure that both Indeo3 decoders were REd from XAnim module). And Ghidra does not like VfW binary (maybe it’ll like the version inside QT6 better) so I can’t easily refer to it either.

Anyway, I attempted and gave up writing an encoder for Indeo 3 several times because of its perceived combinatoric complexity. First you need to split frame recursively into blocks—how to select them? Then you need to select one of the coding modes (again, how?) and codebooks (same question). Trying to think of a reasonable way to implement it all made me shudder and give up until I finally read the format description and persisted enough to write at least something working (side note: I also have the same problem with TrueMotion 1 encoder which I also want to write one day, hopefully it’ll be easier now).

Also I tried to look into the encoder implementation and found it as a bunch of magic numbers at work. I’m not joking, during initialisation it seems to set several dozens of various integers and floats and use them for various coding decisions (at least what I could understand from it is that codebook selection is kinda tied to the internal quantiser parameter which is calculated depending on bitrate/quality—and various magic numbers).

So I want to document how this codec works, what differs in the different descriptions of it and how my encoder decides what to use in different situations. This should amount to another dozen of posts that nobody will read.

Looking again at LSVX

Tuesday, November 29th, 2022

Recently Paul B. Mahol asked me to take a look at LSVX codec (aka Lightning Strike Video Codec from Espre). Since the guy is working on Bink2 decoder for all of us, he deserves some respect. So here’s what I found.

Previously I took a glance on it, found that it’s based on tmndec 3.0, kinda the reference decoder for H.263 and concluded it is an ordinary H.263 rip-off and hadn’t looked further. It turned out to be more interesting though.

LSVX frames start with 5-byte header where the second byte tells the frame type (0x01, 0x08 and 0x09 are for normal frames, 0x05 is from skip frames; additionally if the first two bytes of the header are 0x78 0x01 then the header is followed by another eight bytes, usually with “lsvxv2.0” in them). Then almost normal H.263 stream follows—at least on a prolonged glance it seems to be the standard H.263 stream without any modifications of the headers (except maybe in motion vectors decoding). But there’s a catch! Key frames may have picture type 7 (reserved code in the standard) and then they’re coded with wavelets.

The wavelet coding is rather straightforward: you have a picture split into many bands, each of them is quantised and coded with a semi-adaptive binary coder. By that I mean that it uses models with fixed probabilities but it selects different context for different bits depending on previously decoded bits and in some cases the entier sets of probabilities may be switched mid-decoding. Beside that there are no special tricks like zero tree coding or fancy coefficient prediction.

Maybe I’ll write a decoder for it after all.

Looking at KQ6 Mac videos

Friday, November 25th, 2022

The terrorist country proves that it is recognized as one and keeps targeting civilians instead of fighting a war. So nothing new but it would still be nice to see its demise soon. Meanwhile I keep doing small things to distract myself from all this.

Since I have nothing better to do, I watch reviews of various games including the ones I know well. And one of those reviews mentioned that Macintosh version of King’s Quest VI: Heir Today Gone Tomorrow had peculiar intro. Actually every version of the game has something peculiar about its intro: DOS version uses Sierra’s own RLE-based SEQ format, Windows version uses standard MS Video 1 in AVI (it was my first sample with palette change messages), Amiga version is a reimplementation by Revolution Software on their Virtual Theatre engine altogether (maybe ScummVM will support it one day for all three and a half fans waiting for that). So, what’s with Macintosh version?

First of all, the files are QuickTime movies in the original Macintosh format where frame data is stored in the data fork and movie header is stored inside the resource fork. Since not all modern OSes support such files natively (or conveniently), I’ve hacked a support for such movies in MacBinary format that keeps all forks in one file. And what do we have inside?

Inside the files are video streams packed with Cinepak. One of the peculiarities is that they have palette specified in video header in the format different from the conventional MOV color atom format, let alone the fact it should not be present at all. I understand that for Cinepak and even more for Indeo 3 (I really should write an encoder for it one day) it was common to provide a palette so they rendered their output for 256-colour mode but in that mode Cinepak simply coded palette indices and here we have YUV420 output and a palette as a recommendation.

Then there’s a fun case with tracks in KQ6Movie. I understand that they split video and coded it in several tracks so they could use different palettes (and framerates as it turns out) for different segments. And those tracks are not in order. Tracks 0 and 1 seem to be the very beginning, track 2 corresponds to a scene somewhere in the middle and track 3 is the last intro scene. Other ten tracks are not in order either. Maybe there is some information hidden in the header telling the order but I’m too lazy to find it out (let alone implement).

All in all, this was unexpectedly weird.