While I’ve added palettisation support for NihAV about six years ago, it was limited to per-frame palette generation back then. Since I had only two encoders supporting paletted input and both were accepting palette changes, it worked fine. It’s only when I decided to implement MOV muxer I really got a need for palettisation using global palette, so I’ve started experimenting with that.
First and foremost, design. While frame palettisation is a part of NAScale that handles video frame conversion from and to various formats, this palettisation mode is bolted to nihav-encoder. It is actually implemented in two parts: initial pass that decodes input and generates palette at the end, and actual frame palettisation (which can actually work just fine without that pass if you tell it to use some pre-defined palette). I actually started with the second part and added palette generation later (for tests using default QuickTime palette was enough). Then I went even further and extended palette generation to support segmented mode (i.e. palette may change but not for every frame, I made the limit configurable but it should be at least 10 frames—storing palette for each frame before processing is too much).
This mode is a bit fragile since palettes are calculated for the decoded frames and not for processed frames. And for palette segments it’s even worse since it tells the number of frames for which the palette is valid, so framerate conversion will make it a mess. But the alternative leads to madness libavfilter and that’s hard pass for me. I see filters more as a drug that makes multimedia projects shift attention to them, making it more and more about e.g. filter negotiation and complex graphs support and less about playing actual media; consequently, making everything a filter is a sure sign the project is on its way to becoming obsolete (yes, I don’t hold neither DirectShow nor gstreamer in high regard).
Anyway, after overall design description it’s time to talk about implementation details. Palette generation for video differs from palette generation for single image by the sheer amount of data it needs to take into account. So while I have “let’s waste memory and have a table of 64-bit counters for each possible colour” my first mode was putting colours into smaller buckets (bucket index is calculated from the top bits of components) and join similar colours (e.g. differing just by two low bits in each component) when the number of entries gets too high. It is slower and not as accurate but it consumes less memory and performing vector quantisation on hundreds of thousands of entries is much faster than on millions. So it may have its uses.
Then palette segments generation. I don’t do anything fancy and simply calculate coarse colour histogram to decide when to start a new segment. For cut-off criterion I selected the ratio between correlated histograms and auto-correlated new frame histogram. If they’re of about the same magnitude then I can add this frame to a group, update group histogram and continue, otherwise I generate palette for the just finished segment and start a new one. It’s naïve but it seems to work reasonably well.
Palettisation itself consists of finding an appropriate palette entry for the input colour (I’m aware of dithering but haven’t bothered with it yet). I use the same three methods as back then: brute force search, local search and k-d tree search. The latter is faster than the rest but gives horrible quality so I don’t know if I should improve it or throw away. Local search (especially with a small cache for last 32 results) is a nice trade-off between the rest. And brute force search is implemented by filling a small 16MB table which maps each input colour to the palette index; it is slow to generate but it works extremely fast with global palette and reasonably large video (i.e. more than those sixteen million pixels). For segmented palettes it’s better to use local search though.
That’s it. I realise that I’m the only user of such feature but it gave me something to play with and brought some joy implementing it. After all, who else can claim he converted movieCD into animated GIF without using any 16- or even 32-bit code?