Archive for the ‘Various Video Codecs’ Category

Looking at NUVision

Friday, July 7th, 2023

Since I still have nothing better to do, I decided to look at some obscure video codec I had laying around for really long time. And it turned out to be simple yet rather original.

Unlike many other codecs, this one codes YUV 4:2:2 (at least it looks like that) line per line in chunks of 24, 16 or 8 elements (essentially 24-pixel chunks plus shorter tail and the line width being a multiple of eight). Each chunk can be coded using one of four modes (leave it as is, decode and apply delta, copy chunk from the previous line with or without delta). Delta values are coded as chunk quantiser, delta values (-q, 0, q, escape) plus escape values. And since those mode/delta values can fit into two bits, they’re packed together into 16-bit words.

If you look at it, there’s nothing really inventive: short slices are present in many codecs, lossy delta coding is common too. But together they create a combination that I’ve not seen anywhere else. And that’s why looking at older codecs is pleasing: beside seeing rip-offs of the standard codecs sometimes you also encounter such original ideas as well.

One last experiment with Cinepak encoder

Saturday, June 17th, 2023

I’ve remembered that back in the day there was an encoder for RoQ format (the format that uses a codebook with 2×2 YUV vectors, what a coincidence!) called Switchblade and it was using NeuQuant before it was integrated into FFmpeg where it started to use ELBG. So I decided to give it a try.

If you have forgotten, NeuQuant is an application of Kohonen neural network to the task of generating palette for an image. I’ve implemented that kind already so I tried my hoof at adapting it for a larger vector size. Good thing: it works and it’s reasonably fast (2-3 times slower than median cut, faster than partitioned ELBG—and that’s the code that uses doubles for the majority of its calculations). Bad thing: the result quality is mediocre. The results obviously can be improved by adjusting various factors (wait, am I talking about neural network or string theory?) and changing the pseudo-random order in which the candidates are sampled but I don’t feel enthusiastic about tweaking all those parameters and see which ones work good for the widest selection of video sequences.

So I’m drawing a line here. It was a quick and failed experiment, I should find something better to do.

Yet another MOV quirk

Thursday, June 15th, 2023

Since I had nothing better to do I was browsing FMV games at archive.org and in one of them I found rather peculiar sample: avconv has wrong palette for the first half of it and nihav-tool has wrong palette in the second half of the clip. And I thought that MOV is not supposed to have palette changes at all.

It turned out they used a multiple sample descriptors trick: it’s possible to provide several codec descriptions for one track and use one or another for different frames. That file has two descriptors for the video track with different palettes. Mystery solved.

And it also solved another mystery with a different file from that game where some frames are not decoded properly. It turned out that it also has two sample descriptors for the video track: one is A**le Graphics and another one is Cinepak.

Back in the day I ranted that MOV is too flexible and this proves once again how true that is. Good thing I don’t have to care about supporting such files properly.

Further Cinepak experiments

Monday, June 5th, 2023

For having nothing better to do I kept experimenting with Cinepak encoder.

I considered implementing some variant of codebook decomposition scheme suggested by Tomas in the comments to the previous post but I’m still not sure if I should bother even if it looks promising. So I tried the old thresholds-based scheme instead.

And what do you know, it speeds things up considerably: my usual test sample gets encoded in 27-35 seconds (depending on thresholds) instead of 44 seconds in the usual mode. But since I don’t know what would be good thresholds I did the opposite and added a refinement mode: after deciding which codebook to use for which block I re-generate codebook using only those blocks that belong to it. Of course it increases processing time, for example that file it takes 75 seconds to encode with refinement—which is still 70% more time but still less than double (for comparison, in full ELBG mode it’s an increase from about 160 seconds to 270 seconds).

So by rough estimate selecting only relevant blocks for codebook generation shaves 20-40% off the encoding time. And splitting data into partitions and generating a codebook by parts made the process about three times faster. I suspect that with a proper approach to clustering vector quantisation can be made two-three times faster but I don’t think I want to experiment with that. I should call it a day and move to something else instead.

Quick experiments with Cinepak encoder vector quantisation

Saturday, June 3rd, 2023

Out of curiosity I decided to check how partitioning input before creating a codebook affects encoding speed. So I’ve added a mode to Cinepak encoder that partitions vectors by luma variance and creates a part of common codebook just for them. The other two modes are median cut (the simplest one but also with mediocre output) and ELBG (that uses median cut to create the initial codebook—also if it’s not full that means we have all possible entries and do not need to perform ELBG at all).

Here are rough results on encoding several different files (and using different number of strips): median cut worked for 11-14 seconds, ELBG took 110-160 seconds, new mode (I decided to call it fast) takes 43-62 seconds. I think even such approximate numbers speak for themselves. Also there’s an interesting side effect: because of the partitioning it tends to produce smaller codebooks overall.

And while we’re speaking about quantisation results, here’s the first frame of waterfall sample encoded in different modes:

median cut

fast

full ELBG

As you can see, median cut produces not so good images but maybe those artefacts will make people think about the original Cinepak more. Fast mode is much nicer but it still has some artefacts (just look at the left edge of the waterfall) but if you don’t pay too much attention it’s not much worse than full ELBG.

Are there ways to improve it even further? Definitely. For starters, the original encoder exploits the previous codebook to create a new one while my encoder always generates a new codebook from scratch (in theory I can skip median cut stage for inter strips but I suspect that ELBG will work much longer in this case). The second way is to fasten up the ELBG itself. From what I could see it spends most of the time determining to which cluster each of the points belong. By having some smarter structure (something like k-d tree and some caching to skip recalculating certain clusters altogether) it should be possible to speed it up in several times. Unfortunately in this case I value clarity more so I’ll leave it as is.

P.S. I may also try to see how using thresholds and block variance to decide its coding mode affects the speed and quality (as in this case we first decide how to code blocks and then form codebooks for them instead of forming codebooks first and then deciding which mode suits the current block better; and in this case we’ll have smaller sets to make codebooks from too). But I may do something different instead. Or nothing at all.

A quick glance at the original Cinepak encoder

Friday, May 26th, 2023

Since I don’t have anything to do with NihAV at the time (beside two major tasks that always make me think about doing anything else but them) I decided to look at what tricks did the original Cinepak encoder have.

Apparently it has essentially three settings: interval between key frames (with maximum and minimum values), temporal/spatial quality (for deciding which kinds of coding should be used) and neighbour radius (probably for merging close enough values before actual codebook is calculated).

Skip blocks are decided by sum of squared differences being smaller than the threshold (calculated from the time quality); V1/V4 coding is decided by calculating sum of 2×2 sub-block variances and comparing it against the threshold (calculated from spatial quality).

Codebook creation is done by grouping all blocks into five bins (by logarithm of the variance) and trying to calculate a smaller codebook for each bin independently (so together they’ll make up the full 256-entry codebook).

Overall even if I’m not going to copy that approach it was still interesting to look at.

Indeo 3 encoder: done

Thursday, February 16th, 2023

After fixing some bugs I think my encoder requires no further improvement (there are still things left out there to improve but they are not necessary). The only annoying problem is that decoding some videos with the original VfW binary gives artefacts. Looks like this happens because of its peculiar way to generate and then handle codebooks: corrector pairs and quads are stored as single 32-bit word and its top bit is also used to signal if it’s a dyad or a quad. And looks like for some values (I suspect that for large negative ones) the system does not work so great so while e.g. XAnim module does what is expected, here it mistakes a result for another corrector type and decodes the following bytestream in a wrong way. Of course I’m not going to deal with that annoyance and I doubt anybody will care.

Also I’ve pruned one stupid bug from my MS Video 1 encoder so it should be done as well. The third one, Cinepak, is in laughable state (well, it encodes data but it does not do that correctly let alone effectively); hopefully I’ll work on it later.

For now, as I see no interesting formats to support (suggestions are always welcome BTW), I’ll keep writing toy encoders that nobody will use (and hopefully return to improving Cinepak encoder eventually).

Indeo 3, the MP3 of video codecs

Tuesday, February 14th, 2023

I know that MPEG-4 ASP is a better-known candidate for this role, but Indeo 3 is a strong contender too.

Back in the day it was ubiquitous, patented, and as I re-implemented a decoder for it I discovered another fun similarity with MP3: checksums.

Each Indeo 3 frame has a checksum value embedded in it, calculated as XOR of all pairs of pixels. I had an intent to use it to validate the decoder output but after playing a bit I’ve given up. Some files agree on checksums, others disagree while the output from the reference decoder is exactly the same, in yet another files checksums are correct but byte-swapped and one file has only zeroes for checksums. This is exactly like MP3, and like there Indeo 3 decoders ignore that field.

Also I’ve encountered other fun quirks. For example, one Indeo file is 160×120 but its frame header claims it’s 160×240 (but you still have to decode it as 160×120). You’d think it’s the rule but I know some VMD files from Urban Runner game where the first or last frame are double the size. Another file errors out on the first frame because of the inappropriate opcode encountered (essentially “skip to the line 2” right after “skip to the line 2”) but it turns out that VfW decoder does not check that case and simply uses that opcode as a codebook index.

At least my new decoder should be good enough to iron out the obvious bugs from the encoder and after that I shall forget about that codec again for a long time.

Revisiting MSVideo1 encoder

Wednesday, February 8th, 2023

Recently somebody asked me a question about my MS Video 1 encoder (not the one in NihAV though) and I’ve decided to look if my current encoder can be improved, so I took Ghidra and went to read the binary specification.

Essentially it did what I expected: it understands quality only, for which it calculates thresholds for skip and fill blocks to be used immediately, clustering is done in the usual K-means way and the only curious trick is that it used luminance for that.

So I decided to use that idea for improving my own encoder. I ditched the generic median cut in favour of specially crafted clustering in two groups (I select the largest cluster axis—be it luma, red, green or blue—split 4 or 16 pixels into two groups by being above average or not and calculate the average of those two groups). This made encoding about two times faster. I’ve also fixed a bug with 8-colour blocks so now it encodes data properly (previously it would result in a distorted block). And of course I’ve finally made quality affect encoding process (also by generating thresholds, but with a different formula—unlike the original my encoder uses no floating-point maths anywhere).

Also I’ve added palette mode support. The idea is simple: internally I operate on pixel quads (red, green, blue, luma) so for palette mode I just need to replace an actual pixel value with the index of the most similar palette entry. For that task I reused one of the approaches from my palettiser (it should be faster than iterating over the whole palette every time). Of course the proper way would be to map colour first to have the proper distortion calculated (because the first suitable colour may be far from perfect) but I decided not to pursue this task further, even if it results in some badly-coded features sometimes. It’s still not a serious encoder after all.

Now this member of the early 90s video codecs ruling triumvirate should be good enough. Cinepak encoder is still rather primitive so I’ll have to re-check it. Indeo 3 encoder seems to produce buggy output on complex high-motion scenes (I suspect it’s related to the number of motion vectors exceeding the limit) but it will have to wait until I rewrite the decoder. And then hopefully more interesting experiments will happen.

Indeo 3 encoder: done

Monday, February 6th, 2023

I’ve done what I wanted with the encoder, it seems to work and so I declare it to be finished. It can encode videos that other decoders can decode, it has some adjustable options and even a semblance of rate control.

Of course I’ll return to it if I ever use it and find some bugs but for now I’ll move to other things. For instance, Indeo 3 decoder needs to be rewritten now that I understand the codec better. Also I have some ideas for improving MS Video 1 encoder. And there’s TrueMotion 1 that I wanted to take a stab at. And there are some non-encoder things as well.

There’s a lot of stuff to keep me occupied (provided that I actually get myself occupied with it in the first place).