While NihAV
had support for paletted formats before, now it has more use cases covered. Previously I could only decode paletted format and convert picture into some other format. Now it can handle palette in standard containers like AVI and MOV and even palette change in AVI (it’s done via NASideData
which is essentially the same thing I NIHed more than nine years ago). In addition to that it can convert image into paletted format as well and below I’d like to give a brief review of methods employed.
I always wanted to try vector quantisation and adding conversion to paletted formats is a perfect opportunity to do so.
It should be no surprise to you that conventional algorithms for this are based on ideas from 1980s.
Median cut (described in Color image quantization for frame buffer display by Paul Heckbert in 1982) is a very simple approach: you gather all data into a single box, split it on the largest dimension (e.g. if maximum green value minus minimum green value is greater than red or blue differences then split data into two boxes, one with green components less than average green value and one with larger than average green value), apply recursively to the resulting boxes until you get a desired amount of boxes or can’t split any more. This method works moderately fast and unlike other approaches it does not need an initial palette.
NeuQuant is an algorithm proposed by Anthony Dekker in 1994 but it’s based on work of Teuvo Kohonen in 1980s. Despite being based on neural network, the algorithm works and it’s quite simple to understand. Essentially you have nodes corresponding to palette entries and each new pixel is used to update value in both the nearest-matching palette entry and also its neighbours (with decreasing weight of course). It works fast and produces good results even on partially sampled image but its result is heavily affected by the order pixels are sampled. That’s why for the best results you need to walk image in pseudo-random order while other methods do not care about order at all.
ELBG is the enhancement of Linde-Buzo-Gray algorithm (from 1980) proposed in 2000. In the base algorithm you select random centroids (essentially the centres of pixel clusters that will serve as palette entries in the end), calculate which pixels are the closest to which centroid, calculate proper centroid for these clusters (an average of those pixels essentially), repeat last two steps until centroids don’t move much (or moving them does not improve the quantisation error by much). An enhancement comes from an observation that some clusters might have larger dispersion than the others so moving a centroid from a small cluster of pixels near another small cluster of pixels to a large cluster of pixels might reduce quantisation error. In my experience this is the slowest method of three I tried and the enhancement actually makes it about twice as slow (but of course I did not try to optimise any of these methods beside the very basic stuff). Also since it constantly searches for suitable centroids for every pixel (more than once if you consider enhancement step) it gets really slow with an increased number of pixels. But it’s supposed to give the best results.
And here are some words about palette lookup. After quantisation step is done you still have to assign palette indices to each pixel. The same paper by Heckbert lists three methods for doing that: the usual brute force, local search and k-d tree. Brute force obviously means comparing pixel against every palette entry to find the closest one. Local search means matching pixel against palette entries in its quantised vicinity (I simply keep a list of nearest palette entries for each possible 15-bit quantised pixel value). K-d tree means simply constructing a k-dimensional tree in about the same way as median cut but you do that on palette entries and nodes contain component threshold (e.g. “if red component is less than 120 take left branch, otherwise take right branch”). In my experience k-d tree is the fastest one but its results are not as good as local search.
Output image quality may be improved further by applying dithering but I did not care enough to play with it.
P.S. I’ve also made a generic version of median cut and ELBG algorithms that can be used to quantise any kind of input data and I want to use it in some encoder. But that’s a story for another day. There’s other new and questionably exciting stuff in NihAV
I want to blog about meanwhile.