September « 2023 « Kostya's Boring Codec World

Archive for September, 2023

On emerging dictatorships

Saturday, September 16th, 2023

In some cases authoritarian governments come suddenly, usually as the result of a coup or a legal loophole (Article 48 of Weimar Constitution is an example codified in ages), in other cases the power grab goes gradually like in the famous frog-in-the-boiling-water metaphor. It’s easy to act surprised, as the things are almost the same as they were yesterday or last year, but usually there are tell-tale signs of a country going into authoritarianism. Here I’d like to remind about them.

Probably the most important of them is the judicial branch reform that de facto cancels its independency as the government controls who gets appointed as a judge. After you get loyal judges who’s going to stop your unlawful reforms? We had such situation in Ukraine during 2010-2014 (even without any factual reform; but more about Ukraine later), Poland tries to get it (and that’s disturbing), Hungary has done it (of course), Israel is doing it right now (which may end too bad for the country) and so on. But usually at this stage the things are so bad that the actual dictatorship (or demokratur) is merely the next step, are there any earlier signs?

Again, one of the important but rather late signs is unwillingness to part with power. It may happen in any circumstances but usually it’s parliamentary republic with a dominating party and the same prime minister for decades. There may be a president but he or she is a nominal figure while the prime minister remains at power (see Georgia, Germany, Hungary or Israel). Of course it may still not work out like with Merkel, who had to retire after sixteen years, without leaving a good successor candidate in her own party; IMO she should’ve got a prison sentence for serving russian interests but she got awarded a Grand Cross for that instead. I feel the maximum term for any such post should be about twelve years and anybody, who like the current (since 2010) Hungarian prime minister say they want to remain for another twenty years, should be put to prison immediately. South Korea has a good idea with its presidents, a good deal of them were either shot or imprisoned after their term was over. In Fourecks they imprison politicians right after they’re elected (so save time)—but alas it’s a fictional continent.

But usually it’s not one single person but somebody with a support of major party. And, surprisingly enough, those parties usually have something in common: they rely on populism, promoting “traditional values” to the extent of being against anything new, and very often seeing USA and “the West” as their enemies. In other words, promising to return the country to the times of past glory (or at least when the situation was not so bleak) and blaming foreigners and their agents on everything bad that has happened since those times (I’m sorry if that does not reduce the list of possible candidates by much, that’s the state of modern politics). I should probably add another thing to cheap populism: creating show by solving issues nobody had like renaming a country. I can understand why Congo does not want its capital named after the Belgian king who is responsible for mass genocide in the country (and for the same reason Ukraine renames its towns and streets), I can understand why a country renames itself from Rhodesia, I can understand when a country introduces a preferred name while still recognizing the old one (like Czechia or Sakartvelo). But demanding that your country called abroad differently without any apparent reason for renaming sounds like a cheap gesture to show your own people that your country is proud and how others have to comply with it (while drawing attention from more important issues). In case it’s not obvious I’m talking about The Country Formerly Known As Turkey and The Country Most Likely To Be Known As Modi Raj.

The very minor signs would be the corruption potential: if a party serves interests of an oligarch or an oligarch group it is naïve to expect from it to have no attempts to seize power and hold it as long as possible to exercise it for profit and to avoid prosecution. And a thing related to it—friendship with russia (usually paid from the russian side).

In the end I’d like to talk about Ukraine to serve as an example. There were two and a half attempts to grab power in its recent history. First attempt was made by the second president, Leonid Kuchma: back in the day he even traded some presidential privileges for some additional executive functions he could perform. Over the years he increased his power, often with a help of shady people like Pavlo Lazarenko (famous for being a Ukrainian prime minister, a citizen of Panama and building so wide corruption network that he served a prison sentence in the USA for it) or Yuriy Kravchenko (who turned police into his personal mob). Kuchma often solved problems by firing the current prime minister and selecting a new candidate. Still that has not saved him from the consequences of murdering Gongadze and the Cassette Scandal. Another fun fact is that Ukrainian oligarch Pinchuk had a nickname “Kuchma’s-son-in-law” (because he really was one). It is also said that he tried to game the system by trying to organise 2004 elections in such a way that both candidates would lose and he would remain by default. Those elections and rigged counting though caused so much public outrage (called the first Maidan) that they had to repeat the vote.

The second attempt was when viktor yanukovich assumed the post of the president and the Party of Regions (from which he came) got the majority in the parliament. Those were dark times when he used kangaroo courts to deal with his opponents (because no judge could be appointed at that time without approval from the President’s Office) and cancel previous constitutional reform limiting presidential power; people affiliated with him (“the Family”) could raid and steal other businesses and it is said they managed to steal a sixth of the country budget every year in addition to that. As you know, it ended by him selling Ukraine to russia, fleeing the country and starting the war that goes to this day. It is less remembered that during his last days of presidency his party tried to pass the same laws as in russia to limit the freedoms and allow to prosecute the protesters easily. It did not work out.

The half-attempt is when Volodymyr Zelensky became a president and his party got majority in the parliament (mostly thanks to his image). He had not so great decisions in the beginning (and the parliament supported them) so it could end badly. But then a führer decided to take Kyiv in three days and the Ukrainian nation united against the enemy once again. Since then Zelensky (and somewhat his party) act decently because it’s the matter of survival. Who knows how it will go after the victory but for now I have nothing else to say about him.

I hope this brief review of Ukrainian political history served an example of how important it is to stop authoritarianism when you can still do it without human casualties. Revolutions often take a much higher price…

Posted in Useless Rants | Comments Closed

NihAV: now with GIF support

Monday, September 11th, 2023

One would wonder why. The answer is simple: I merely wanted to play with LZW encoding and created a GIF encoder. Then I needed to debug it and thus ended up with a decoder as well. And while I was at it I added animated GIF support for both decoding and encoding.

I’m not going to repeat on how it works, it’s been written in all possible textbooks and tutorials on data compression (and then some), I’m more interested in implementing an efficient encoder for it. I should remind you that LZW compression was patented (at least by Sperry and IBM) and that patent introduced a lot of chaos in multimedia before MP3 or H.264 were a thing.

The algorithm simply tells you to find a match for the input string in the dictionary and add a new string to the dictionary. You can keep the actual strings in the dictionary (simple but not very effective), you can keep the reference to it in the input (simple and saves a lot on dictionary space but not very effective and you can’t feed it input by chunks), you can keep the same dictionary structure as the decoder and try to match input to the code sequence (the most compact way to store the dictionary but not so straightforward to use; more on a practical extension of this approach at the end of this post). Since I do not care about wasting whopping two megabytes of RAM on the dictionary, I implemented a simple trie: each node contains a code ending at that node and 256 pointers to the next next trie nodes (each code/pointer fits into 16-bit integer with zero as “invalid value” pointer). Of course it can be made more compact but I prefer simplicity and clarity.

Now, the question is whether the compression efficiency can be improved somehow. Since the decoder expects encoder to work in a certain manner we can’t change format. The tricks that work good for LZ77 like lazy matching do not work that great for LZ78: you have dictionary state changed at each step and you don’t know which entry will be more useful in the future. Also emitting shorter matches means we also litter the dictionary with duplicates, which does not help compression in the long run either.

Thus the only thing we can use is creative dictionary clearing. E.g. we can emit reset code after every N symbols encoded to keep the code length constant (that’s the trick various libraries used to produce the compliant stream without doing any compression and thus not infringing the patent), or you can keep the dictionary filled with what it was until the very end (and if data properties change your compression ratio hurts), or (as most encoders do) you simply reset the dictionary after it gets full. But can you do better?

I tried two heuristics. First, I tried segmenting data by having a sliding window and comparing the character occurrence in each half, if those differed by a lot then it was a good place to restart. From what I heard, this simple approach allows better compression for LZ77- and BWT-based compression schemes. In this case it helped only when compressing the data that could not be compressed at all (e.g. greyscale continuous-tone images), in other cases the results were worse than with the default strategy. The second approach was simple: keep the rolling average of last, say, sixteen match lengths and if it drops significantly when the dictionary is full then it’s time to clear it. Surprisingly enough it seemed to improve compression for several fractions of a percent in general cases (and sometimes worsened the ratio on hard-to-compress files by about the same amount) so I kept it.

And while talking about GIF and LZW, here are some words about FFmpeg support of GIF: it’s rather laughable. While encoding to GIF has been supported from the early days (from 2002, to be less vague), actual LZW compression support was introduced only in 2009 (using a work of some Summer of Code student from 2007; the notorious patent on LZW compression had expired by that time). A bit later in the same year the encoder was improved to work with paletted input instead of RGB24 (which was maimed to use fixed palette anyway). In the same time the original animated GIF encoder+muxer (not to be confused with now improved single-image GIF encoder) kept existing with all the original deficiencies (RGB24 input, fixed palette, no compression) so by encoding to .gif without specifying image2 muxer you’d get significantly worse results. I heard it’d been improved at last back in 2018 but I have not been caring before so why start now?

The interesting thing there is that LZW compression employed for both GIF and TIFF seems to be based on compress approach with a hash table. Essentially the dictionary is stored in the same form as in the decoder (a reference to the previous code plus suffix character) but indices are hash codes instead of sequential entries and the encoder in a decoder mirror mode: it traverses the table to find the code that corresponds to the current code plus a new character (and if it’s not found, the prefix code is output and new code pair is added to the dictionary). Also it seems to compress data slightly worse than the straightforward implementation in some cases (e.g. in my tests one 320×240 PGM file got expanded to 83159 bytes instead of 83132 bytes and another 112×288 PPM file got compressed to 10371 bytes instead of 10366 bytes; the difference is negligible but it’s there). I suspect this is caused by hash collisions i.e. in some rare cases a shorter code and a longer one have the same hash and the former gets selected even when the latter should be better.

In either case, this might have been not a very useful exercise but it was fun and somewhat educational too. Now only to find a thing to do next…

Posted in Image Formats, NihAV | 2 Comments »

BTC is bullshit

Monday, September 4th, 2023

Don’t get me wrong, the idea behind it is sound but it got overhyped and misapplied. Of course I’m talking about one of an early attempts at lossy image compression called block truncation coding, what did you think about?

For those who forgot about it (and is too lazy to read its description), the method replaces values in a block with two values below/above block mean value using that mean value, standard deviation and that number of values that were above the mean value. This algorithm is often quoted as being the one on which many video codecs (especially the ones used in various games but some standard ones like Micro$oft Video 1 and A**le Graphics(SMC) and Video(RPZA) as well) are built. And that part is a bullshit which many used to believe mostly because nobody who heard it took any time to evaluate that statement (I was no exception).

Actually I should’ve started doubting it much earlier as I’ve tried to apply it to colour quantisation (like in Video 1 encoder) and failed. The method is applicable to scalar values only (and pixels are vectors of three components in our case, you can map them to greyscale but how would you calculate two distinct colours to segment block into?) and its results are worse than using Linde-Buzo-Gray method for vector quantisation (which was presented in the paper the following year). Wikipedia has an article describing a proper image compression algorithm proposed in 1986 called color cell compression that definitely looks like the perfect candidate for all the following codecs: it describes compressing image by splitting it into 4×4 tiles, grouping pixels in those tiles using average luminance as the discriminator and calculating two colours to paint the tile as averages of those two groups. That’s how vector quantisation works and unlike BTC it does not require calculating square roots and can be implemented trivially using integer maths only. So it’s practical, gives better results (in terms of MSE of greyscale images when compared to BTC) and works on actual pixel data.

While BTC was innovative for its time and probably an important stepping stone for further methods, its relevance to the modern compression schemes is minimal (unlike colour cell compression) and calling it the base for the codecs with two-colour vector quantisation is as stupid as calling Cyrano de Bergerac the father of space flight because he mentioned travel to the Moon using gunpowder rockets in a novel of his.

Posted in Useless Rants | 6 Comments »

NihAV: adding SGA support

Saturday, September 2nd, 2023

Since I had nothing better to do this week I decided to finally add Digital Pictures SGA decoding support to NihAV. While there are many different formats described in The Wiki, I’ve decided to play only those not described there (namely $81/$8A, $85, $86 and $89).

In my previous post on this matter I mentioned that the formats I took interest in are using 8×8 tiles that may be subdivided into 8×4 or 4×4 parts and filled with several colours using a predefined pattern (or an arbitrary one for 8×8 tile if requested) plus some bits to select one of two possible colours for each tile pixel. The main difference between $81/$8A scheme and the others is that it codes all data in the same bitstream while the later versions split colours and opcode+pattern bits into two separate partitions (maybe they had plans for compressing it?) plus they store audio data inside the frame.

And here are some notes on the games (I think most of those are PC or Macintosh ports but it’s possible the same files were used in console versions of some of those games as well):

Double Switch—this one uses $81 compression (in still images, cutscenes embed them along with $A2 audio in $F1 chunks);
Quarterback Attack—this one uses $8A compression in $F9 chunks;
Night Trap—$85 compression and megafiles (i.e. almost all cutscenes are stored in single NTMOVIE file that require some external index to access them). Also the PC release had a short documentary about the moral panic around that game (in the same format of course; in two resolutions even);
Corpse Killer—$86 compression and one megafile for all cutscenes;
Supreme Warrior—$89 compression, one megafile and no frame dimensions given. For most of the cutscenes it’s 256×160 but at the end (logo and maybe something else) it’s different. Additionally there are two audio tracks: some audio chunks contain twice as much data (and have high bit of size set), in that case the first half corresponds to English speech and the second half is Chinese; otherwise it’s the same for both versions (e.g. background music, fighters grunting, sound effects and so on).

Overall, it was an interesting experience even if I don’t care about the games themselves.

Posted in Game Video, NihAV | Comments Closed

Tell me how you format output and I tell you what programming language you are

Friday, September 1st, 2023

Since K&R times it’s traditional to make a programming language print “Hello, world!” as the introduction to it. Recently I had a thought that formatting a number in output tells you the essentials of the programming language itself.

So if you want to print not just a constant string but, say, a number expanded to a certain length (e.g. printf("%4d\n", val);), how can you do it?

Essentially there are three major approaches: using WRITE with optional specifiers for each element, using one format string with arguments to follow and having just basic output with utilities to format elements using some pattern and string concatenation. And there’s C++.

The first approach can be found in the old venerable languages like FORTRAN or ALGOL. It is also somewhat funny that in Pascal its writeln() can’t be implemented in the language itself so the compiler has to convert it to a sequence of formatting and writes or string concatenations (exactly the third way). The same is true for Rust as well (which belongs to the second category) but at least println!() outright tells you that it’s not a normal function but rather a macro or a compiler plug-in.

The second approach even if not invented got popularised by C. You have a template string with recognizable percent signs to signal where arguments should be substituted and in what form. Of course this is not the safest approach since you may pass a template with not enough arguments and read garbage from the stack (and that’s not counting the dreaded %n). In the recent years another template format got popular (probably thanks to Python, with an influence from bash or PHP): instead of percent sign and type there are braces with an optional format arguments or variable name inside. Speaking of Python, you can use print with the old printf-style or new f-string formatting and it will put a newline at the end by default (reminding once again that Python is all about whitespaces). As for the implementation, in some languages it’s a built-in function (i.e. you can’t implement it in the language itself but the compiler/interpreter takes care of it), in C it’s implemented by parsing the stack more or less directly, in many modern languages variable-argument function simply passes those arguments as an array of the most generic object types (so you can implement something like printf() yourself). I think I can also mention here the third pattern format mostly used for formatting binary output—the one used in Perl pack() function (and whatever languages borrowed it), its syntax reminds me of the classic languages from the first category.

The third approach is too primitive so usually languages try to use some syntactic sugar to make it easier to use (see Pascal or Rust mentioned above). I should probably mention INTERCAL where outputting a single character might pose a challenge for an uninitiated but that’s part of the charm of the language.

And finally C++. In the original version of the language you were supposed to use cout << setw(4) << val << endl; and while it demonstrates the advantages of the operator overloading, in the same time it suffers from the extreme verbosity and side effects (that width modifier will be applied to all the following argument in all following prints until you change it). In the modern C++ you should use std::cout, std::setw() and std::endl() instead. As the result people mostly use printf() or some third-party formatting library (you may even get it with one of your string class implementations for free).

As you can see, the way you can print a formatted number (both the syntax and the implementation) can tell you a lot about the programming language in a rather short time. Of course there are exceptions but the lack of such functionality tells you at least about the kind of language you're dealing with.

Posted in Useless Rants | Comments Closed

Kostya's Boring Codec World

Archive for September, 2023

On emerging dictatorships

NihAV: now with GIF support

BTC is bullshit

NihAV: adding SGA support

Tell me how you format output and I tell you what programming language you are

Pages

Archives

Categories