Kostya « Kostya's Boring Codec World

Author Archive

Bink video encoder tricks

Sunday, October 8th, 2023

As I mentioned in the introductory post, there are nine block coding modes and my encoder tries them all to see which is good (or good enough) to be used. What I have not mentioned is that some of those blocks have sixteen variations (quantisers for DCT-based blocks and scan patterns for RLE blocks), which makes search even longer.

First of all, I use the rather obvious approach to trying blocks: order them in a sequence, motion blocks types first, then simple ones (fill, two-colour pattern, RLE) with intra DCT and raw being the last. And if the block metric is low enough then the block is good enough and the rest of the modes should not be tried.

Second, there are two encoding modes: quality-based and bitrate-based. For bitrate-based mode I simply manipulate lambda ratio between block distortion and bits and that’s it. For quality mode I actually collected statistics on different files at different quality settings to see what block types are used more or less. I.e. on the highest quality setting intra blocks are not used at all while on low quality settings you don’t see lossless residue or raw blocks.

So I simply used the option to disable different block coding modes (introduced to make debugging other coding modes easier) and modified the list depending on quality setting.

Then I went even further and observed the statistics of the DCT block quantisers used depending on quality settings. As one could reasonably expect, low quality setting resulting in quantisers 12-15 (and rarely 11) while high quality setting used quantisers 0-10 the most. So limiting the quantisers depending on quality was the next logical step.

And here are some other tricks:

on lower quality levels I use non-zero threshold for RLE block so that e.g. a sequence 4, 3, 5, 4 will be treated as a run of fours;
in the previous version of the encoder I used Block Truncation Coding (probably the only possible application of it even if a bit unwise), now I’m simply calculating averages of the values above/below block mean value;
in rate-distortion metric I scale the bits value, doing otherwise often leads to the essentially free skip block being selected almost every time and blocky moving pictures are better than slightly less blocky still one.

Of course it is easy to come with more but it’s been enough for me.

Posted in binkencfail, NihAV | 2 Comments »

Bink bitstream bundling

Saturday, October 7th, 2023

Bink Video organises frame data somewhat on per-row basis and separated into individual streams. That means that data is sent in portions in interleaved fashion: first it is block types, then it’s colour values for e.g. fill or pattern blocks, then it’s motion values and so on. Before each row decoding can start, decoder should check if it still has some data from that stream or read more. For example, in the beginning you may get just enough block types transmitted for one row, zero motion vectors and one DC value for a block somewhere in the middle of the stream; so for the second row you refill only block types, and DC values stream will be queried only after that single value in it has been used.

And to complicate the things further, there may be yet another kind of data stored in-between: RLE block information (scan index, copy/run bit and run lengths), DCT coefficients and lossless residue data.

It’s also worth noting that while version 'b' simply stored stream data in fixed-length bit-fields, the latter versions employed static codebooks to improve compression.

Anyway, how to write such streams correctly? My original encoder supported only several block types and was able to bundle all stream data together to transmit it once for the whole plane. But since I wanted to support all possible block types, I had to devise more flexible system. So I ended up with a structure that holds all data in per-row entries so when the plane data is gathered I can easily decide how many rows of data to send and when more data needs to be sent. Additional bitstream data is also stored there (in form of (code, length) pairs) and can be written for each row that has it.

Beside that I’ve introduced a structure to all stream data for one block. This makes it easy to calculate how many bits will be used to code it, output the data (just append its contents to the corresponding entries in the plane data) and even reconstruct the block from it (except for DCT-based ones). Actually I use three instances of it (one for the best block coding candidate, one for the current block coding and one for trying block coding variants if the mode permits it) but that’s a minor detail.

In either case, even if this approach to coding is not unique (TrueMotion codecs employed it as well, to give one example), it’s distinct enough from the other variants I’ve seen and was still quite fun to implement.

Posted in binkencfail | Comments Closed

Bink encoding: format and encoder designs

Friday, October 6th, 2023

Here I’m going to give a brief review of how Bink formats (container, video and audio codecs) are designed and how it affected the overall encoder design in NihAV.
(more…)

Posted in binkencfail, NihAV | 2 Comments »

Documenting failure(s) of a Bink encoder

Thursday, October 5th, 2023

Since I really had nothing better to do, I decided to try writing Bink-b encoder. Again. The previous version was a simple program to compress image sequence into a somewhat working video file, the new version supports all possible features: motion compensation, all block types including DCT-based ones, quality- and bitrate-oriented compression modes and even audio.

I worked on the encoders not because I have any real need but rather in order to both understand better how Bink codecs work and what tricks can be employed in the decoder. Bink video supports different block coding modes, from rather ordinary DCT to RLE using a custom scan order and special mode for coding motion residues. Bink audio is a simple perceptual codec that should still give me some possibilities for experimentation (of course I remember how I failed to write AAC encoder, but maybe with such simple format the outcome will be not too horrible?).

In the following posts I’ll try to describe how some things in Bink format work, what difficulties I had to overcome and what the results are.

Eventually I want to test my implementation against the original decoder (by creating a custom heroes3.vid or video.vid for HoMM3). Also it is rather easy to enhance my work to produce latter-version Bink videos but I’m not sure if it’s worth doing that as there’s a good free encoder for it already (probably a much better one than I can ever create) and Bink format is not a good candidate for normal videos as it was created for game cutscenes with the only options to keep or end playing, no option for rewinding back at random place to repeat scene you missed (or skip some boring part).

In either case, it’s still a nice experience. After all, NihAV is about experimenting and learning how stuff works and Bink offered a lot of new things to try.

Posted in binkencfail | 6 Comments »

On emerging dictatorships

Saturday, September 16th, 2023

In some cases authoritarian governments come suddenly, usually as the result of a coup or a legal loophole (Article 48 of Weimar Constitution is an example codified in ages), in other cases the power grab goes gradually like in the famous frog-in-the-boiling-water metaphor. It’s easy to act surprised, as the things are almost the same as they were yesterday or last year, but usually there are tell-tale signs of a country going into authoritarianism. Here I’d like to remind about them.

Probably the most important of them is the judicial branch reform that de facto cancels its independency as the government controls who gets appointed as a judge. After you get loyal judges who’s going to stop your unlawful reforms? We had such situation in Ukraine during 2010-2014 (even without any factual reform; but more about Ukraine later), Poland tries to get it (and that’s disturbing), Hungary has done it (of course), Israel is doing it right now (which may end too bad for the country) and so on. But usually at this stage the things are so bad that the actual dictatorship (or demokratur) is merely the next step, are there any earlier signs?

Again, one of the important but rather late signs is unwillingness to part with power. It may happen in any circumstances but usually it’s parliamentary republic with a dominating party and the same prime minister for decades. There may be a president but he or she is a nominal figure while the prime minister remains at power (see Georgia, Germany, Hungary or Israel). Of course it may still not work out like with Merkel, who had to retire after sixteen years, without leaving a good successor candidate in her own party; IMO she should’ve got a prison sentence for serving russian interests but she got awarded a Grand Cross for that instead. I feel the maximum term for any such post should be about twelve years and anybody, who like the current (since 2010) Hungarian prime minister say they want to remain for another twenty years, should be put to prison immediately. South Korea has a good idea with its presidents, a good deal of them were either shot or imprisoned after their term was over. In Fourecks they imprison politicians right after they’re elected (so save time)—but alas it’s a fictional continent.

But usually it’s not one single person but somebody with a support of major party. And, surprisingly enough, those parties usually have something in common: they rely on populism, promoting “traditional values” to the extent of being against anything new, and very often seeing USA and “the West” as their enemies. In other words, promising to return the country to the times of past glory (or at least when the situation was not so bleak) and blaming foreigners and their agents on everything bad that has happened since those times (I’m sorry if that does not reduce the list of possible candidates by much, that’s the state of modern politics). I should probably add another thing to cheap populism: creating show by solving issues nobody had like renaming a country. I can understand why Congo does not want its capital named after the Belgian king who is responsible for mass genocide in the country (and for the same reason Ukraine renames its towns and streets), I can understand why a country renames itself from Rhodesia, I can understand when a country introduces a preferred name while still recognizing the old one (like Czechia or Sakartvelo). But demanding that your country called abroad differently without any apparent reason for renaming sounds like a cheap gesture to show your own people that your country is proud and how others have to comply with it (while drawing attention from more important issues). In case it’s not obvious I’m talking about The Country Formerly Known As Turkey and The Country Most Likely To Be Known As Modi Raj.

The very minor signs would be the corruption potential: if a party serves interests of an oligarch or an oligarch group it is naïve to expect from it to have no attempts to seize power and hold it as long as possible to exercise it for profit and to avoid prosecution. And a thing related to it—friendship with russia (usually paid from the russian side).

In the end I’d like to talk about Ukraine to serve as an example. There were two and a half attempts to grab power in its recent history. First attempt was made by the second president, Leonid Kuchma: back in the day he even traded some presidential privileges for some additional executive functions he could perform. Over the years he increased his power, often with a help of shady people like Pavlo Lazarenko (famous for being a Ukrainian prime minister, a citizen of Panama and building so wide corruption network that he served a prison sentence in the USA for it) or Yuriy Kravchenko (who turned police into his personal mob). Kuchma often solved problems by firing the current prime minister and selecting a new candidate. Still that has not saved him from the consequences of murdering Gongadze and the Cassette Scandal. Another fun fact is that Ukrainian oligarch Pinchuk had a nickname “Kuchma’s-son-in-law” (because he really was one). It is also said that he tried to game the system by trying to organise 2004 elections in such a way that both candidates would lose and he would remain by default. Those elections and rigged counting though caused so much public outrage (called the first Maidan) that they had to repeat the vote.

The second attempt was when viktor yanukovich assumed the post of the president and the Party of Regions (from which he came) got the majority in the parliament. Those were dark times when he used kangaroo courts to deal with his opponents (because no judge could be appointed at that time without approval from the President’s Office) and cancel previous constitutional reform limiting presidential power; people affiliated with him (“the Family”) could raid and steal other businesses and it is said they managed to steal a sixth of the country budget every year in addition to that. As you know, it ended by him selling Ukraine to russia, fleeing the country and starting the war that goes to this day. It is less remembered that during his last days of presidency his party tried to pass the same laws as in russia to limit the freedoms and allow to prosecute the protesters easily. It did not work out.

The half-attempt is when Volodymyr Zelensky became a president and his party got majority in the parliament (mostly thanks to his image). He had not so great decisions in the beginning (and the parliament supported them) so it could end badly. But then a führer decided to take Kyiv in three days and the Ukrainian nation united against the enemy once again. Since then Zelensky (and somewhat his party) act decently because it’s the matter of survival. Who knows how it will go after the victory but for now I have nothing else to say about him.

I hope this brief review of Ukrainian political history served an example of how important it is to stop authoritarianism when you can still do it without human casualties. Revolutions often take a much higher price…

Posted in Useless Rants | Comments Closed

NihAV: now with GIF support

Monday, September 11th, 2023

One would wonder why. The answer is simple: I merely wanted to play with LZW encoding and created a GIF encoder. Then I needed to debug it and thus ended up with a decoder as well. And while I was at it I added animated GIF support for both decoding and encoding.

I’m not going to repeat on how it works, it’s been written in all possible textbooks and tutorials on data compression (and then some), I’m more interested in implementing an efficient encoder for it. I should remind you that LZW compression was patented (at least by Sperry and IBM) and that patent introduced a lot of chaos in multimedia before MP3 or H.264 were a thing.

The algorithm simply tells you to find a match for the input string in the dictionary and add a new string to the dictionary. You can keep the actual strings in the dictionary (simple but not very effective), you can keep the reference to it in the input (simple and saves a lot on dictionary space but not very effective and you can’t feed it input by chunks), you can keep the same dictionary structure as the decoder and try to match input to the code sequence (the most compact way to store the dictionary but not so straightforward to use; more on a practical extension of this approach at the end of this post). Since I do not care about wasting whopping two megabytes of RAM on the dictionary, I implemented a simple trie: each node contains a code ending at that node and 256 pointers to the next next trie nodes (each code/pointer fits into 16-bit integer with zero as “invalid value” pointer). Of course it can be made more compact but I prefer simplicity and clarity.

Now, the question is whether the compression efficiency can be improved somehow. Since the decoder expects encoder to work in a certain manner we can’t change format. The tricks that work good for LZ77 like lazy matching do not work that great for LZ78: you have dictionary state changed at each step and you don’t know which entry will be more useful in the future. Also emitting shorter matches means we also litter the dictionary with duplicates, which does not help compression in the long run either.

Thus the only thing we can use is creative dictionary clearing. E.g. we can emit reset code after every N symbols encoded to keep the code length constant (that’s the trick various libraries used to produce the compliant stream without doing any compression and thus not infringing the patent), or you can keep the dictionary filled with what it was until the very end (and if data properties change your compression ratio hurts), or (as most encoders do) you simply reset the dictionary after it gets full. But can you do better?

I tried two heuristics. First, I tried segmenting data by having a sliding window and comparing the character occurrence in each half, if those differed by a lot then it was a good place to restart. From what I heard, this simple approach allows better compression for LZ77- and BWT-based compression schemes. In this case it helped only when compressing the data that could not be compressed at all (e.g. greyscale continuous-tone images), in other cases the results were worse than with the default strategy. The second approach was simple: keep the rolling average of last, say, sixteen match lengths and if it drops significantly when the dictionary is full then it’s time to clear it. Surprisingly enough it seemed to improve compression for several fractions of a percent in general cases (and sometimes worsened the ratio on hard-to-compress files by about the same amount) so I kept it.

And while talking about GIF and LZW, here are some words about FFmpeg support of GIF: it’s rather laughable. While encoding to GIF has been supported from the early days (from 2002, to be less vague), actual LZW compression support was introduced only in 2009 (using a work of some Summer of Code student from 2007; the notorious patent on LZW compression had expired by that time). A bit later in the same year the encoder was improved to work with paletted input instead of RGB24 (which was maimed to use fixed palette anyway). In the same time the original animated GIF encoder+muxer (not to be confused with now improved single-image GIF encoder) kept existing with all the original deficiencies (RGB24 input, fixed palette, no compression) so by encoding to .gif without specifying image2 muxer you’d get significantly worse results. I heard it’d been improved at last back in 2018 but I have not been caring before so why start now?

The interesting thing there is that LZW compression employed for both GIF and TIFF seems to be based on compress approach with a hash table. Essentially the dictionary is stored in the same form as in the decoder (a reference to the previous code plus suffix character) but indices are hash codes instead of sequential entries and the encoder in a decoder mirror mode: it traverses the table to find the code that corresponds to the current code plus a new character (and if it’s not found, the prefix code is output and new code pair is added to the dictionary). Also it seems to compress data slightly worse than the straightforward implementation in some cases (e.g. in my tests one 320×240 PGM file got expanded to 83159 bytes instead of 83132 bytes and another 112×288 PPM file got compressed to 10371 bytes instead of 10366 bytes; the difference is negligible but it’s there). I suspect this is caused by hash collisions i.e. in some rare cases a shorter code and a longer one have the same hash and the former gets selected even when the latter should be better.

In either case, this might have been not a very useful exercise but it was fun and somewhat educational too. Now only to find a thing to do next…

Posted in Image Formats, NihAV | 2 Comments »

BTC is bullshit

Monday, September 4th, 2023

Don’t get me wrong, the idea behind it is sound but it got overhyped and misapplied. Of course I’m talking about one of an early attempts at lossy image compression called block truncation coding, what did you think about?

For those who forgot about it (and is too lazy to read its description), the method replaces values in a block with two values below/above block mean value using that mean value, standard deviation and that number of values that were above the mean value. This algorithm is often quoted as being the one on which many video codecs (especially the ones used in various games but some standard ones like Micro$oft Video 1 and A**le Graphics(SMC) and Video(RPZA) as well) are built. And that part is a bullshit which many used to believe mostly because nobody who heard it took any time to evaluate that statement (I was no exception).

Actually I should’ve started doubting it much earlier as I’ve tried to apply it to colour quantisation (like in Video 1 encoder) and failed. The method is applicable to scalar values only (and pixels are vectors of three components in our case, you can map them to greyscale but how would you calculate two distinct colours to segment block into?) and its results are worse than using Linde-Buzo-Gray method for vector quantisation (which was presented in the paper the following year). Wikipedia has an article describing a proper image compression algorithm proposed in 1986 called color cell compression that definitely looks like the perfect candidate for all the following codecs: it describes compressing image by splitting it into 4×4 tiles, grouping pixels in those tiles using average luminance as the discriminator and calculating two colours to paint the tile as averages of those two groups. That’s how vector quantisation works and unlike BTC it does not require calculating square roots and can be implemented trivially using integer maths only. So it’s practical, gives better results (in terms of MSE of greyscale images when compared to BTC) and works on actual pixel data.

While BTC was innovative for its time and probably an important stepping stone for further methods, its relevance to the modern compression schemes is minimal (unlike colour cell compression) and calling it the base for the codecs with two-colour vector quantisation is as stupid as calling Cyrano de Bergerac the father of space flight because he mentioned travel to the Moon using gunpowder rockets in a novel of his.

Posted in Useless Rants | 6 Comments »

NihAV: adding SGA support

Saturday, September 2nd, 2023

Since I had nothing better to do this week I decided to finally add Digital Pictures SGA decoding support to NihAV. While there are many different formats described in The Wiki, I’ve decided to play only those not described there (namely $81/$8A, $85, $86 and $89).

In my previous post on this matter I mentioned that the formats I took interest in are using 8×8 tiles that may be subdivided into 8×4 or 4×4 parts and filled with several colours using a predefined pattern (or an arbitrary one for 8×8 tile if requested) plus some bits to select one of two possible colours for each tile pixel. The main difference between $81/$8A scheme and the others is that it codes all data in the same bitstream while the later versions split colours and opcode+pattern bits into two separate partitions (maybe they had plans for compressing it?) plus they store audio data inside the frame.

And here are some notes on the games (I think most of those are PC or Macintosh ports but it’s possible the same files were used in console versions of some of those games as well):

Double Switch—this one uses $81 compression (in still images, cutscenes embed them along with $A2 audio in $F1 chunks);
Quarterback Attack—this one uses $8A compression in $F9 chunks;
Night Trap—$85 compression and megafiles (i.e. almost all cutscenes are stored in single NTMOVIE file that require some external index to access them). Also the PC release had a short documentary about the moral panic around that game (in the same format of course; in two resolutions even);
Corpse Killer—$86 compression and one megafile for all cutscenes;
Supreme Warrior—$89 compression, one megafile and no frame dimensions given. For most of the cutscenes it’s 256×160 but at the end (logo and maybe something else) it’s different. Additionally there are two audio tracks: some audio chunks contain twice as much data (and have high bit of size set), in that case the first half corresponds to English speech and the second half is Chinese; otherwise it’s the same for both versions (e.g. background music, fighters grunting, sound effects and so on).

Overall, it was an interesting experience even if I don’t care about the games themselves.

Posted in Game Video, NihAV | Comments Closed

Tell me how you format output and I tell you what programming language you are

Friday, September 1st, 2023

Since K&R times it’s traditional to make a programming language print “Hello, world!” as the introduction to it. Recently I had a thought that formatting a number in output tells you the essentials of the programming language itself.

So if you want to print not just a constant string but, say, a number expanded to a certain length (e.g. printf("%4d\n", val);), how can you do it?

Essentially there are three major approaches: using WRITE with optional specifiers for each element, using one format string with arguments to follow and having just basic output with utilities to format elements using some pattern and string concatenation. And there’s C++.

The first approach can be found in the old venerable languages like FORTRAN or ALGOL. It is also somewhat funny that in Pascal its writeln() can’t be implemented in the language itself so the compiler has to convert it to a sequence of formatting and writes or string concatenations (exactly the third way). The same is true for Rust as well (which belongs to the second category) but at least println!() outright tells you that it’s not a normal function but rather a macro or a compiler plug-in.

The second approach even if not invented got popularised by C. You have a template string with recognizable percent signs to signal where arguments should be substituted and in what form. Of course this is not the safest approach since you may pass a template with not enough arguments and read garbage from the stack (and that’s not counting the dreaded %n). In the recent years another template format got popular (probably thanks to Python, with an influence from bash or PHP): instead of percent sign and type there are braces with an optional format arguments or variable name inside. Speaking of Python, you can use print with the old printf-style or new f-string formatting and it will put a newline at the end by default (reminding once again that Python is all about whitespaces). As for the implementation, in some languages it’s a built-in function (i.e. you can’t implement it in the language itself but the compiler/interpreter takes care of it), in C it’s implemented by parsing the stack more or less directly, in many modern languages variable-argument function simply passes those arguments as an array of the most generic object types (so you can implement something like printf() yourself). I think I can also mention here the third pattern format mostly used for formatting binary output—the one used in Perl pack() function (and whatever languages borrowed it), its syntax reminds me of the classic languages from the first category.

The third approach is too primitive so usually languages try to use some syntactic sugar to make it easier to use (see Pascal or Rust mentioned above). I should probably mention INTERCAL where outputting a single character might pose a challenge for an uninitiated but that’s part of the charm of the language.

And finally C++. In the original version of the language you were supposed to use cout << setw(4) << val << endl; and while it demonstrates the advantages of the operator overloading, in the same time it suffers from the extreme verbosity and side effects (that width modifier will be applied to all the following argument in all following prints until you change it). In the modern C++ you should use std::cout, std::setw() and std::endl() instead. As the result people mostly use printf() or some third-party formatting library (you may even get it with one of your string class implementations for free).

As you can see, the way you can print a formatted number (both the syntax and the implementation) can tell you a lot about the programming language in a rather short time. Of course there are exceptions but the lack of such functionality tells you at least about the kind of language you're dealing with.

Posted in Useless Rants | Comments Closed

Looking at Tex Murphy video formats

Wednesday, August 30th, 2023

Since I have nothing better to do, I decided to look at the formats used in various Tex Murphy games.

So here they are (the game from this millennium does not count):

Mean Streets—it uses raw images in ILBM-inspired format (so that even some chunk names are still the same) grouped with e.g. palette files and packed in ZIP using ancient compression methods (shrink and reduce, I wonder if anything but ancient PKZIP.EXE can unpack them all);
Martian Memorandum—the only VID file present looks like mostly raw image data interleaved with some metadata to tell which regions to update (though judging from ScummVM sources it may also employ RLE for sprites);
Under a Killing Moon—this one has new format starting with 32-bit size followed by signature “PTF” and after 64-byte header I see the suspiciously familiar chunk structure: 32-bit size, 16-bit signature 0xF1FA that contain some sub-chunks of its own… I refer you to The Multimedia Mike, he should be able to recognize the format;
The Pandora Directive—this one uses format that starts with “H2O” and seems to be Huffman-compressed RLE;
Tex Murphy: Overseer—this one decided not to be creative and used Smacker files (along with some FLICs).

So, despite its nine-year history (again, only the games from last millennium) those games have changed the format used for video segments yet they have not moved further than RLE.

Posted in Game Video | 2 Comments »

Kostya's Boring Codec World