Bink encoding: format and encoder designs

October 6th, 2023

Here I’m going to give a brief review of how Bink formats (container, video and audio codecs) are designed and how it affected the overall encoder design in NihAV.
Read the rest of this entry »

Documenting failure(s) of a Bink encoder

October 5th, 2023

Since I really had nothing better to do, I decided to try writing Bink-b encoder. Again. The previous version was a simple program to compress image sequence into a somewhat working video file, the new version supports all possible features: motion compensation, all block types including DCT-based ones, quality- and bitrate-oriented compression modes and even audio.

I worked on the encoders not because I have any real need but rather in order to both understand better how Bink codecs work and what tricks can be employed in the decoder. Bink video supports different block coding modes, from rather ordinary DCT to RLE using a custom scan order and special mode for coding motion residues. Bink audio is a simple perceptual codec that should still give me some possibilities for experimentation (of course I remember how I failed to write AAC encoder, but maybe with such simple format the outcome will be not too horrible?).

In the following posts I’ll try to describe how some things in Bink format work, what difficulties I had to overcome and what the results are.

Eventually I want to test my implementation against the original decoder (by creating a custom heroes3.vid or video.vid for HoMM3). Also it is rather easy to enhance my work to produce latter-version Bink videos but I’m not sure if it’s worth doing that as there’s a good free encoder for it already (probably a much better one than I can ever create) and Bink format is not a good candidate for normal videos as it was created for game cutscenes with the only options to keep or end playing, no option for rewinding back at random place to repeat scene you missed (or skip some boring part).

In either case, it’s still a nice experience. After all, NihAV is about experimenting and learning how stuff works and Bink offered a lot of new things to try.

On emerging dictatorships

September 16th, 2023

In some cases authoritarian governments come suddenly, usually as the result of a coup or a legal loophole (Article 48 of Weimar Constitution is an example codified in ages), in other cases the power grab goes gradually like in the famous frog-in-the-boiling-water metaphor. It’s easy to act surprised, as the things are almost the same as they were yesterday or last year, but usually there are tell-tale signs of a country going into authoritarianism. Here I’d like to remind about them.

Probably the most important of them is the judicial branch reform that de facto cancels its independency as the government controls who gets appointed as a judge. After you get loyal judges who’s going to stop your unlawful reforms? We had such situation in Ukraine during 2010-2014 (even without any factual reform; but more about Ukraine later), Poland tries to get it (and that’s disturbing), Hungary has done it (of course), Israel is doing it right now (which may end too bad for the country) and so on. But usually at this stage the things are so bad that the actual dictatorship (or demokratur) is merely the next step, are there any earlier signs?

Again, one of the important but rather late signs is unwillingness to part with power. It may happen in any circumstances but usually it’s parliamentary republic with a dominating party and the same prime minister for decades. There may be a president but he or she is a nominal figure while the prime minister remains at power (see Georgia, Germany, Hungary or Israel). Of course it may still not work out like with Merkel, who had to retire after sixteen years, without leaving a good successor candidate in her own party; IMO she should’ve got a prison sentence for serving russian interests but she got awarded a Grand Cross for that instead. I feel the maximum term for any such post should be about twelve years and anybody, who like the current (since 2010) Hungarian prime minister say they want to remain for another twenty years, should be put to prison immediately. South Korea has a good idea with its presidents, a good deal of them were either shot or imprisoned after their term was over. In Fourecks they imprison politicians right after they’re elected (so save time)—but alas it’s a fictional continent.

But usually it’s not one single person but somebody with a support of major party. And, surprisingly enough, those parties usually have something in common: they rely on populism, promoting “traditional values” to the extent of being against anything new, and very often seeing USA and “the West” as their enemies. In other words, promising to return the country to the times of past glory (or at least when the situation was not so bleak) and blaming foreigners and their agents on everything bad that has happened since those times (I’m sorry if that does not reduce the list of possible candidates by much, that’s the state of modern politics). I should probably add another thing to cheap populism: creating show by solving issues nobody had like renaming a country. I can understand why Congo does not want its capital named after the Belgian king who is responsible for mass genocide in the country (and for the same reason Ukraine renames its towns and streets), I can understand why a country renames itself from Rhodesia, I can understand when a country introduces a preferred name while still recognizing the old one (like Czechia or Sakartvelo). But demanding that your country called abroad differently without any apparent reason for renaming sounds like a cheap gesture to show your own people that your country is proud and how others have to comply with it (while drawing attention from more important issues). In case it’s not obvious I’m talking about The Country Formerly Known As Turkey and The Country Most Likely To Be Known As Modi Raj.

The very minor signs would be the corruption potential: if a party serves interests of an oligarch or an oligarch group it is naïve to expect from it to have no attempts to seize power and hold it as long as possible to exercise it for profit and to avoid prosecution. And a thing related to it—friendship with russia (usually paid from the russian side).

In the end I’d like to talk about Ukraine to serve as an example. There were two and a half attempts to grab power in its recent history. First attempt was made by the second president, Leonid Kuchma: back in the day he even traded some presidential privileges for some additional executive functions he could perform. Over the years he increased his power, often with a help of shady people like Pavlo Lazarenko (famous for being a Ukrainian prime minister, a citizen of Panama and building so wide corruption network that he served a prison sentence in the USA for it) or Yuriy Kravchenko (who turned police into his personal mob). Kuchma often solved problems by firing the current prime minister and selecting a new candidate. Still that has not saved him from the consequences of murdering Gongadze and the Cassette Scandal. Another fun fact is that Ukrainian oligarch Pinchuk had a nickname “Kuchma’s-son-in-law” (because he really was one). It is also said that he tried to game the system by trying to organise 2004 elections in such a way that both candidates would lose and he would remain by default. Those elections and rigged counting though caused so much public outrage (called the first Maidan) that they had to repeat the vote.

The second attempt was when viktor yanukovich assumed the post of the president and the Party of Regions (from which he came) got the majority in the parliament. Those were dark times when he used kangaroo courts to deal with his opponents (because no judge could be appointed at that time without approval from the President’s Office) and cancel previous constitutional reform limiting presidential power; people affiliated with him (“the Family”) could raid and steal other businesses and it is said they managed to steal a sixth of the country budget every year in addition to that. As you know, it ended by him selling Ukraine to russia, fleeing the country and starting the war that goes to this day. It is less remembered that during his last days of presidency his party tried to pass the same laws as in russia to limit the freedoms and allow to prosecute the protesters easily. It did not work out.

The half-attempt is when Volodymyr Zelensky became a president and his party got majority in the parliament (mostly thanks to his image). He had not so great decisions in the beginning (and the parliament supported them) so it could end badly. But then a führer decided to take Kyiv in three days and the Ukrainian nation united against the enemy once again. Since then Zelensky (and somewhat his party) act decently because it’s the matter of survival. Who knows how it will go after the victory but for now I have nothing else to say about him.

I hope this brief review of Ukrainian political history served an example of how important it is to stop authoritarianism when you can still do it without human casualties. Revolutions often take a much higher price…

NihAV: now with GIF support

September 11th, 2023

One would wonder why. The answer is simple: I merely wanted to play with LZW encoding and created a GIF encoder. Then I needed to debug it and thus ended up with a decoder as well. And while I was at it I added animated GIF support for both decoding and encoding.

I’m not going to repeat on how it works, it’s been written in all possible textbooks and tutorials on data compression (and then some), I’m more interested in implementing an efficient encoder for it. I should remind you that LZW compression was patented (at least by Sperry and IBM) and that patent introduced a lot of chaos in multimedia before MP3 or H.264 were a thing.

The algorithm simply tells you to find a match for the input string in the dictionary and add a new string to the dictionary. You can keep the actual strings in the dictionary (simple but not very effective), you can keep the reference to it in the input (simple and saves a lot on dictionary space but not very effective and you can’t feed it input by chunks), you can keep the same dictionary structure as the decoder and try to match input to the code sequence (the most compact way to store the dictionary but not so straightforward to use; more on a practical extension of this approach at the end of this post). Since I do not care about wasting whopping two megabytes of RAM on the dictionary, I implemented a simple trie: each node contains a code ending at that node and 256 pointers to the next next trie nodes (each code/pointer fits into 16-bit integer with zero as “invalid value” pointer). Of course it can be made more compact but I prefer simplicity and clarity.

Now, the question is whether the compression efficiency can be improved somehow. Since the decoder expects encoder to work in a certain manner we can’t change format. The tricks that work good for LZ77 like lazy matching do not work that great for LZ78: you have dictionary state changed at each step and you don’t know which entry will be more useful in the future. Also emitting shorter matches means we also litter the dictionary with duplicates, which does not help compression in the long run either.

Thus the only thing we can use is creative dictionary clearing. E.g. we can emit reset code after every N symbols encoded to keep the code length constant (that’s the trick various libraries used to produce the compliant stream without doing any compression and thus not infringing the patent), or you can keep the dictionary filled with what it was until the very end (and if data properties change your compression ratio hurts), or (as most encoders do) you simply reset the dictionary after it gets full. But can you do better?

I tried two heuristics. First, I tried segmenting data by having a sliding window and comparing the character occurrence in each half, if those differed by a lot then it was a good place to restart. From what I heard, this simple approach allows better compression for LZ77- and BWT-based compression schemes. In this case it helped only when compressing the data that could not be compressed at all (e.g. greyscale continuous-tone images), in other cases the results were worse than with the default strategy. The second approach was simple: keep the rolling average of last, say, sixteen match lengths and if it drops significantly when the dictionary is full then it’s time to clear it. Surprisingly enough it seemed to improve compression for several fractions of a percent in general cases (and sometimes worsened the ratio on hard-to-compress files by about the same amount) so I kept it.

And while talking about GIF and LZW, here are some words about FFmpeg support of GIF: it’s rather laughable. While encoding to GIF has been supported from the early days (from 2002, to be less vague), actual LZW compression support was introduced only in 2009 (using a work of some Summer of Code student from 2007; the notorious patent on LZW compression had expired by that time). A bit later in the same year the encoder was improved to work with paletted input instead of RGB24 (which was maimed to use fixed palette anyway). In the same time the original animated GIF encoder+muxer (not to be confused with now improved single-image GIF encoder) kept existing with all the original deficiencies (RGB24 input, fixed palette, no compression) so by encoding to .gif without specifying image2 muxer you’d get significantly worse results. I heard it’d been improved at last back in 2018 but I have not been caring before so why start now?

The interesting thing there is that LZW compression employed for both GIF and TIFF seems to be based on compress approach with a hash table. Essentially the dictionary is stored in the same form as in the decoder (a reference to the previous code plus suffix character) but indices are hash codes instead of sequential entries and the encoder in a decoder mirror mode: it traverses the table to find the code that corresponds to the current code plus a new character (and if it’s not found, the prefix code is output and new code pair is added to the dictionary). Also it seems to compress data slightly worse than the straightforward implementation in some cases (e.g. in my tests one 320×240 PGM file got expanded to 83159 bytes instead of 83132 bytes and another 112×288 PPM file got compressed to 10371 bytes instead of 10366 bytes; the difference is negligible but it’s there). I suspect this is caused by hash collisions i.e. in some rare cases a shorter code and a longer one have the same hash and the former gets selected even when the latter should be better.

In either case, this might have been not a very useful exercise but it was fun and somewhat educational too. Now only to find a thing to do next…

BTC is bullshit

September 4th, 2023

Don’t get me wrong, the idea behind it is sound but it got overhyped and misapplied. Of course I’m talking about one of an early attempts at lossy image compression called block truncation coding, what did you think about?

For those who forgot about it (and is too lazy to read its description), the method replaces values in a block with two values below/above block mean value using that mean value, standard deviation and that number of values that were above the mean value. This algorithm is often quoted as being the one on which many video codecs (especially the ones used in various games but some standard ones like Micro$oft Video 1 and A**le Graphics(SMC) and Video(RPZA) as well) are built. And that part is a bullshit which many used to believe mostly because nobody who heard it took any time to evaluate that statement (I was no exception).

Actually I should’ve started doubting it much earlier as I’ve tried to apply it to colour quantisation (like in Video 1 encoder) and failed. The method is applicable to scalar values only (and pixels are vectors of three components in our case, you can map them to greyscale but how would you calculate two distinct colours to segment block into?) and its results are worse than using Linde-Buzo-Gray method for vector quantisation (which was presented in the paper the following year). Wikipedia has an article describing a proper image compression algorithm proposed in 1986 called color cell compression that definitely looks like the perfect candidate for all the following codecs: it describes compressing image by splitting it into 4×4 tiles, grouping pixels in those tiles using average luminance as the discriminator and calculating two colours to paint the tile as averages of those two groups. That’s how vector quantisation works and unlike BTC it does not require calculating square roots and can be implemented trivially using integer maths only. So it’s practical, gives better results (in terms of MSE of greyscale images when compared to BTC) and works on actual pixel data.

While BTC was innovative for its time and probably an important stepping stone for further methods, its relevance to the modern compression schemes is minimal (unlike colour cell compression) and calling it the base for the codecs with two-colour vector quantisation is as stupid as calling Cyrano de Bergerac the father of space flight because he mentioned travel to the Moon using gunpowder rockets in a novel of his.

NihAV: adding SGA support

September 2nd, 2023

Since I had nothing better to do this week I decided to finally add Digital Pictures SGA decoding support to NihAV. While there are many different formats described in The Wiki, I’ve decided to play only those not described there (namely $81/$8A, $85, $86 and $89).

In my previous post on this matter I mentioned that the formats I took interest in are using 8×8 tiles that may be subdivided into 8×4 or 4×4 parts and filled with several colours using a predefined pattern (or an arbitrary one for 8×8 tile if requested) plus some bits to select one of two possible colours for each tile pixel. The main difference between $81/$8A scheme and the others is that it codes all data in the same bitstream while the later versions split colours and opcode+pattern bits into two separate partitions (maybe they had plans for compressing it?) plus they store audio data inside the frame.

And here are some notes on the games (I think most of those are PC or Macintosh ports but it’s possible the same files were used in console versions of some of those games as well):

  • Double Switch—this one uses $81 compression (in still images, cutscenes embed them along with $A2 audio in $F1 chunks);
  • Quarterback Attack—this one uses $8A compression in $F9 chunks;
  • Night Trap$85 compression and megafiles (i.e. almost all cutscenes are stored in single NTMOVIE file that require some external index to access them). Also the PC release had a short documentary about the moral panic around that game (in the same format of course; in two resolutions even);
  • Corpse Killer$86 compression and one megafile for all cutscenes;
  • Supreme Warrior$89 compression, one megafile and no frame dimensions given. For most of the cutscenes it’s 256×160 but at the end (logo and maybe something else) it’s different. Additionally there are two audio tracks: some audio chunks contain twice as much data (and have high bit of size set), in that case the first half corresponds to English speech and the second half is Chinese; otherwise it’s the same for both versions (e.g. background music, fighters grunting, sound effects and so on).

Overall, it was an interesting experience even if I don’t care about the games themselves.

Tell me how you format output and I tell you what programming language you are

September 1st, 2023

Since K&R times it’s traditional to make a programming language print “Hello, world!” as the introduction to it. Recently I had a thought that formatting a number in output tells you the essentials of the programming language itself.

So if you want to print not just a constant string but, say, a number expanded to a certain length (e.g. printf("%4d\n", val);), how can you do it?

Essentially there are three major approaches: using WRITE with optional specifiers for each element, using one format string with arguments to follow and having just basic output with utilities to format elements using some pattern and string concatenation. And there’s C++.

The first approach can be found in the old venerable languages like FORTRAN or ALGOL. It is also somewhat funny that in Pascal its writeln() can’t be implemented in the language itself so the compiler has to convert it to a sequence of formatting and writes or string concatenations (exactly the third way). The same is true for Rust as well (which belongs to the second category) but at least println!() outright tells you that it’s not a normal function but rather a macro or a compiler plug-in.

The second approach even if not invented got popularised by C. You have a template string with recognizable percent signs to signal where arguments should be substituted and in what form. Of course this is not the safest approach since you may pass a template with not enough arguments and read garbage from the stack (and that’s not counting the dreaded %n). In the recent years another template format got popular (probably thanks to Python, with an influence from bash or PHP): instead of percent sign and type there are braces with an optional format arguments or variable name inside. Speaking of Python, you can use print with the old printf-style or new f-string formatting and it will put a newline at the end by default (reminding once again that Python is all about whitespaces). As for the implementation, in some languages it’s a built-in function (i.e. you can’t implement it in the language itself but the compiler/interpreter takes care of it), in C it’s implemented by parsing the stack more or less directly, in many modern languages variable-argument function simply passes those arguments as an array of the most generic object types (so you can implement something like printf() yourself). I think I can also mention here the third pattern format mostly used for formatting binary output—the one used in Perl pack() function (and whatever languages borrowed it), its syntax reminds me of the classic languages from the first category.

The third approach is too primitive so usually languages try to use some syntactic sugar to make it easier to use (see Pascal or Rust mentioned above). I should probably mention INTERCAL where outputting a single character might pose a challenge for an uninitiated but that’s part of the charm of the language.

And finally C++. In the original version of the language you were supposed to use cout << setw(4) << val << endl; and while it demonstrates the advantages of the operator overloading, in the same time it suffers from the extreme verbosity and side effects (that width modifier will be applied to all the following argument in all following prints until you change it). In the modern C++ you should use std::cout, std::setw() and std::endl() instead. As the result people mostly use printf() or some third-party formatting library (you may even get it with one of your string class implementations for free).

As you can see, the way you can print a formatted number (both the syntax and the implementation) can tell you a lot about the programming language in a rather short time. Of course there are exceptions but the lack of such functionality tells you at least about the kind of language you're dealing with.

Looking at Tex Murphy video formats

August 30th, 2023

Since I have nothing better to do, I decided to look at the formats used in various Tex Murphy games.

So here they are (the game from this millennium does not count):

  • Mean Streets—it uses raw images in ILBM-inspired format (so that even some chunk names are still the same) grouped with e.g. palette files and packed in ZIP using ancient compression methods (shrink and reduce, I wonder if anything but ancient PKZIP.EXE can unpack them all);
  • Martian Memorandum—the only VID file present looks like mostly raw image data interleaved with some metadata to tell which regions to update (though judging from ScummVM sources it may also employ RLE for sprites);
  • Under a Killing Moon—this one has new format starting with 32-bit size followed by signature “PTF” and after 64-byte header I see the suspiciously familiar chunk structure: 32-bit size, 16-bit signature 0xF1FA that contain some sub-chunks of its own… I refer you to The Multimedia Mike, he should be able to recognize the format;
  • The Pandora Directive—this one uses format that starts with “H2O” and seems to be Huffman-compressed RLE;
  • Tex Murphy: Overseer—this one decided not to be creative and used Smacker files (along with some FLICs).

So, despite its nine-year history (again, only the games from last millennium) those games have changed the format used for video segments yet they have not moved further than RLE.

H.264 decoder postmortem

August 27th, 2023

I mentioned before couple of times that NihAV has its own functioning H.264 decoder. And after my failed attempts to use hardware accelerated decoding instead, I spend some time trying to optimise it but eventually gave up. On one hand it’s fast enough for my needs, on the other hand it’s too tedious to optimise it further (even if I can spare time on it, I’d rather not).

To put it into perspective, initially it was about three times slower than libavcodec one without SIMD optimisations, now it’s only about two times slower (with SIMD turned on it’s about five times as slow, feel free to laugh at me). But at the same time playing 720p content (and I have next to no files with larger resolution) in multi-threading mode takes 20-25% of the core so it’s not that bad.

So how the cycles are wasted and is there a potential for serious optimisation?
Read the rest of this entry »

A moral dilemma

August 22nd, 2023

Disclaimer: the question presented in this post does not affect me in any way but it’s still perplexing enough to make it public. Also please note I don’t mention names (neither of people nor software) as this post is not about shaming them.

As any other area of human activity, multimedia has its own share of, ahem, eccentric people that are obsessed a bit too much about their project. For example, there’s a guy who constantly informs the world about even the slightest advancements of his image codec with his own unique image quality metric. Theoretically it should be interesting but the author keeps ignoring the useful advice he gets (like making his code work with a different image size or explaining how his metric is better than the others) in hope that somebody else will get interested enough to fix that for him, as his lack of free time prevents him from doing anything but minor improvements to the code. Maybe it’s not the wisest approach but it does not harm anybody so good luck to him, maybe one day there will be a breakthrough and it will get at least a limited fame. But there’s another example that came to my attention recently, which is significantly more disturbing.

So there’s a certain codec that has niche popularity for its speed and decent compression ratio. Since it was proprietary and somewhat popular, a certain person (not me) has reverse-engineered it and added decoding support for the format to FFmpeg. The reaction from the creator was rather baffling at the time, it was like he felt the control of that codec was wrestled from him. Oh well, enough time has passed with no other issues arising. But last week the same person who REd the decoder announced about that he’s working on an opensource encoder for it, and that’s when the situation exploded. The format creator in a tone that I think is called passive-aggressive told that it’s essentially a stolen work and that it made him stop on a new version of the work. And what is significantly worse and greatly disturbing, his words sound like he got a depression from it or even suicidal thoughts. Even while I have reasons to believe that the encoder in question is going to be an original work (i.e. not a plagiarism; REing format to ensure compatibility is also permitted by the law in many countries) the possible consequences are still deeply disturbing, to say the least.

Thus several questions arise: what should be the best course of action to resolve situations like this one? Was an opensource implementation even for a decoder a mistake and should it be removed entirely? Should the author better communicate his wishes that there should be no alternative implementations whatsoever in the first place and should the others honour it if the product becomes too popular for its own good? Even if the law permits it, what about the morals?

I can only be happy for the fact that I’m not involved in it at all. In either case it would be nice to know the answers—and even nicer is they will never ever be useful.

P.S. Corporations are not people so do not try to project the situation to them. And if they feel offended their lawyers will tell you that (so far I think only N*llym*ser tried it).

P.P.S. If you think I also suffer from similar psychological issues—maybe you’re right, I’m not be eligible to judge. At least I do not try to force my stuff onto others, I don’t even post anything at public places except in this very easy to ignore blog.