Archive for the ‘Image Formats’ Category

NihAV: now with GIF support

Monday, September 11th, 2023

One would wonder why. The answer is simple: I merely wanted to play with LZW encoding and created a GIF encoder. Then I needed to debug it and thus ended up with a decoder as well. And while I was at it I added animated GIF support for both decoding and encoding.

I’m not going to repeat on how it works, it’s been written in all possible textbooks and tutorials on data compression (and then some), I’m more interested in implementing an efficient encoder for it. I should remind you that LZW compression was patented (at least by Sperry and IBM) and that patent introduced a lot of chaos in multimedia before MP3 or H.264 were a thing.

The algorithm simply tells you to find a match for the input string in the dictionary and add a new string to the dictionary. You can keep the actual strings in the dictionary (simple but not very effective), you can keep the reference to it in the input (simple and saves a lot on dictionary space but not very effective and you can’t feed it input by chunks), you can keep the same dictionary structure as the decoder and try to match input to the code sequence (the most compact way to store the dictionary but not so straightforward to use; more on a practical extension of this approach at the end of this post). Since I do not care about wasting whopping two megabytes of RAM on the dictionary, I implemented a simple trie: each node contains a code ending at that node and 256 pointers to the next next trie nodes (each code/pointer fits into 16-bit integer with zero as “invalid value” pointer). Of course it can be made more compact but I prefer simplicity and clarity.

Now, the question is whether the compression efficiency can be improved somehow. Since the decoder expects encoder to work in a certain manner we can’t change format. The tricks that work good for LZ77 like lazy matching do not work that great for LZ78: you have dictionary state changed at each step and you don’t know which entry will be more useful in the future. Also emitting shorter matches means we also litter the dictionary with duplicates, which does not help compression in the long run either.

Thus the only thing we can use is creative dictionary clearing. E.g. we can emit reset code after every N symbols encoded to keep the code length constant (that’s the trick various libraries used to produce the compliant stream without doing any compression and thus not infringing the patent), or you can keep the dictionary filled with what it was until the very end (and if data properties change your compression ratio hurts), or (as most encoders do) you simply reset the dictionary after it gets full. But can you do better?

I tried two heuristics. First, I tried segmenting data by having a sliding window and comparing the character occurrence in each half, if those differed by a lot then it was a good place to restart. From what I heard, this simple approach allows better compression for LZ77- and BWT-based compression schemes. In this case it helped only when compressing the data that could not be compressed at all (e.g. greyscale continuous-tone images), in other cases the results were worse than with the default strategy. The second approach was simple: keep the rolling average of last, say, sixteen match lengths and if it drops significantly when the dictionary is full then it’s time to clear it. Surprisingly enough it seemed to improve compression for several fractions of a percent in general cases (and sometimes worsened the ratio on hard-to-compress files by about the same amount) so I kept it.

And while talking about GIF and LZW, here are some words about FFmpeg support of GIF: it’s rather laughable. While encoding to GIF has been supported from the early days (from 2002, to be less vague), actual LZW compression support was introduced only in 2009 (using a work of some Summer of Code student from 2007; the notorious patent on LZW compression had expired by that time). A bit later in the same year the encoder was improved to work with paletted input instead of RGB24 (which was maimed to use fixed palette anyway). In the same time the original animated GIF encoder+muxer (not to be confused with now improved single-image GIF encoder) kept existing with all the original deficiencies (RGB24 input, fixed palette, no compression) so by encoding to .gif without specifying image2 muxer you’d get significantly worse results. I heard it’d been improved at last back in 2018 but I have not been caring before so why start now?

The interesting thing there is that LZW compression employed for both GIF and TIFF seems to be based on compress approach with a hash table. Essentially the dictionary is stored in the same form as in the decoder (a reference to the previous code plus suffix character) but indices are hash codes instead of sequential entries and the encoder in a decoder mirror mode: it traverses the table to find the code that corresponds to the current code plus a new character (and if it’s not found, the prefix code is output and new code pair is added to the dictionary). Also it seems to compress data slightly worse than the straightforward implementation in some cases (e.g. in my tests one 320×240 PGM file got expanded to 83159 bytes instead of 83132 bytes and another 112×288 PPM file got compressed to 10371 bytes instead of 10366 bytes; the difference is negligible but it’s there). I suspect this is caused by hash collisions i.e. in some rare cases a shorter code and a longer one have the same hash and the former gets selected even when the latter should be better.

In either case, this might have been not a very useful exercise but it was fun and somewhat educational too. Now only to find a thing to do next…

Visiting multimedia grave

Monday, October 31st, 2022

When people ask why I call the search division of Alphabet Inc Baidu, I answer that I do it in spite to muddle their search index and mostly because they remind me of a Chinese totalitarian company. And recent news only reaffirm such views.

As you should remember, Baidu is famous for its graveyard for the killed projects—it even has a separate alley for the messenger apps. And looks like it prepares a plot under a concrete duck for burying some multimedia formats (which makes it interesting to me).

The history of multimedia formats at Baidu essentially started with the purchase of On2 and releasing VP8 in WebMKV format. Then VP8 was mostly buried since VP9 was created (some of it remains hidden inside WebP format), VP9’s turn is near since VP10 is here to succeed it (under the name of AV1).

In the recent news though it turns out that Chrome is deprecating its support for JPEG XL, a format developed mostly at Baidu and the only one properly standardised. But as we all know, Chrome currently controls the Web and removing support for it means that the format will remain obscure. Kinda like in a Soviet joke where a foreign tourist asks in a shop why there’s no caviar and hears that there’s no demand for it—and as he observed for a whole day nobody asked for it indeed (in case it’s not obvious people in the USSR didn’t ask for caviar at the shops because they knew it would not be sold there; see also Baidu Stadia).

And because it was not enough, people spotted that WebP2 has changed its status to experimental, meaning that it won’t be supported either.

So we have, VP9 buried in favour of AV1, JPEG XL being buried in favour of AV1F, WebP2 being buried in favour of AV1F (which is AV1 still frames in MP4) and the original WebP is likely to follow the suit. Now consider that AV1 is recommended to be distributed inside MP4 instead of WebMKV and you’ll fear about the future of that container as well.

I guess now all is left for them to do is to adopt Baidu Lyra as non-experimental codec to purge Vorbis and Opus not created by them and then bury it in favour of AV1-based audio compression. That would make a nice collective grave of formats killed by Baidu to make space for AV1.

So, do you know when AV2 should arrive?

A Quick Look on DLI

Thursday, May 26th, 2016

So yesterday I had a quick look on DLI image format. It turns out to be somewhat related to video codecs (and JPEG of course): there’s 8×8 fast integer DCT approximation, quantisation and bit coding of the block. And bit coding is the most interesting part really—this format employs binary model with old-school arithmetic coding and context selection for the model; coefficients are coded as first an array of coded coefficients flags (plus a flag for last coded coefficient), each non-zero coefficient has an additional flag to signal whether it’s larger than one and in that case it’s coded as unary code (bits coded with arithmetic coder of course).

And I still don’t like the notion of “let’s make our video codec I-frames into an image codec” (the reverse of “motion <image format>” is not much better but at least it makes sense for intermediate formats). Images and video codecs have different use cases and required features but I think I ranted about it once.