Since I wanted to do something different I decided to finally implement deflate
support for NihAV
—by which I mean compression support in addition to decompression. Here is how well it went.
As usual, my goal was to implement it in mostly straightforward way but with reasonable speed instead of having something completely on par with zlib
or better.
At first I implemented the simplest form of compression – copying data without compression (but with the proper headers and ADLER-32 checksum at the end). Then I added a simple encoding with fixed codes that simply output symbols as it—no compression yet but at least it tests how well bitstream is written. Then I moved to dynamic codes. Then I added a brute force search and started encoding matches. So by the end of the weekend I had something working already and then I could make it faster and/or better.
Of course the first thing to remember is that you can reduce search time by using some structure for a faster text search. I think suffix trie is now popular but I settled for an old-fashioned hash by three bytes. Initially it was twice as slow since while the number of string comparisons decreased hundredfold, updating hash table on each step consumed too much time. So I switched to linked-list hash that resembles FAT somewhat (i.e. for each position in the input you have a pointer to the next location of the same three-letter hash plus an additional table pointing to the start of chain for each hash key). And I calculated it once per a large block just discarding matches outside of the desired range. Of course this can be done better but it worked fast enough for me.
Now the compression. There are three main strategies I tried: naïve aka greedy one (you simply output the longest match you can find at the current step), lazy (you also check the next position if it produces even better match and use it if possible—surprisingly enough it gives a significant benefit) and theoretically optimal (you construct a trellis and see which combination and literals can give you the best coding; it has issues but theoretically it’s the best one).
So why it’s “theoretically optimal” and not just optimal? Because it needs to calculate the accurate bit cost and you can’t know it until you produce all the symbols to be encoded and calculate the actual lengths for them. Of course you can do it in an iterative process or employ a heuristic to predict bit length somehow but I simply used “9 bits for the symbol and 5 bits plus escape bits for distance additionally if it’s present”. I think for some cases it even produced larger files than lazy decoding.
Here is a list from the top of my head of things than can be improved (but I guess anybody who has written a LZ77-based compressor knows it better than me):
- method selection—sometimes copying data verbatim is better (in the case of noise) or using fixed codes (because the overhead from transmitting dynamic codes eats all the advantage);
- partitioning—currently I use 64kB blocks but depending on content (usually detected by the symbol frequency variations) it’s better to cut block earlier or make it larger. I played a bit with the block size but changing it (in either direction) currently leads to compression ratio drops;
- faster search for the matching strings;
- heuristics for either faster optimal parsing or better-compressing other method.
Of course some of it can be sped up by simply using unsafe Rust so no checks on array access are performed but I don’t think it’s worth it now.
And finally here are some benchmarks for the curious ones performed on a source file of the program:
- copy: 32156 bytes (from 32145 bytes)
- fixed codes and greedy search: 7847 bytes, 80ms
- dynamic codes and greedy search: 6818 bytes, 80ms
- dynamic codes and lazy search: 6665 bytes, 100ms
- dynamic codes and “optimal” search: 6529 bytes, 690ms
gzip -9
for the reference: 6466 bytes, <10ms
As you can see it’s not fast but it works. I also checked that the resulting compressed data is decoded fine (plus some tests on large files that will be split into several blocks). Now all that’s left is to implement ZMBV decoder and encoder.
You should check out zopfli if you want even better deflate compression. Perhaps steal some ideas from it. zopflipng is also useful just on its own.
Speaking of ZMBV, me and Matthew Fearnley have had discussions with the DosBox team of adding 1, 2, 4 and 24-bit support to the format. gzip did a good enough job on the lower bitdepths, but 24-bit is something they might implement. If they do they’ll likely use BGR byte order, since 32-bit is already BGR0. There are threads on [FFmpeg-devel] about this.
That’s an interesting suggestion, but it was not the goal. One can spend a lot of time trying to perfect the code but I just wanted to try writing something usable but still simple. But if I even want to improve it I know where to look beside that library.
As for ZMBV, I care about it mostly because it is a lossless codec that is rather simple and supports various bit-depths (8/16/32 is enough for my needs though I’d be happy to see 1/2/4/24 bpp supported officially).
Yes, I did suspect that making the perfect DEFLATE encoder wasn’t the goal here. This does give me the idea of linking libavcodec/zmbvenc.c with libzopfli1 for yet better compression. Something that could be done library-wide in lavc come to think of it..
Looks like Zopfli is not a drop-in replacement for zlib but it would be interesting to see how it fares in the case whole block is compressed at once. It may be the law of diminishing returns in action where squeezing additional kilobytes would make it twice as slow.
[…] Kostya's Rants around Multimedia « Adding deflate support to NihAV […]