On over- and under-engineered codecs

September 10th, 2024

Since my last post got many comments (more than one counts as many here) about various codecs, I feel I need to clarify my views on what I see as over-engineered codec as well as under-engineered codec.

First of all, let’s define what does not make a codec an over-engineered one. The sheer number of features alone does not qualify: the codec may need those to fulfill its role—e.g. Bink Video had many different coding modes but this was necessary for coding mixed content (e.g. sharp text and smooth 3D models in the same picture); and the features may have been accumulating over time—just look at those H.26x codecs that went a long way, adding features at each revision to improve compression in certain use cases. Similarly it’s not the codec complexity per se either: simple methods can’t always give you the compression you may need. So what is it then?

Engineering in essence is a craft of solving a certain class of problems using practical approaches. Yes, it’s a bit vague but it shows the attitude, like in the famous joke about three professionals in a burning hotel: an engineer sees a fire extinguisher and uses it to put out fire with the minimum effort, a physicist sees a fire extinguisher, isolates a burning patch with it and studies a process of burning for a while, a mathematician sees a fire extinguisher, says that there’s a solution and goes to sleep.

Anyway, I can define an over- or under-engineered codec by its design effectiveness i.e. the amount of features and complexity introduced in relation to the achieved result as well as the target goal. Of course there’s rarely a perfect codec so I’ll use a simpler metric: a codec with several useless features (i.e. those that can be thrown out without hurting compression ratio or speed) will be called over-engineered and a codec which can be drastically improved without changing its overall design will be called under-engineered. For example, an RLE scheme that allows run/copy length of zero can be somewhat improved but it’s fine per se (and the decoder for it may be a bit faster this way); an RLE scheme that uses zero value as an escape with real operation length in the following byte is under-engineered—now you can code longer runs but if you add a constant to it you can code even longer runs and not waste half of the length on coding what can be coded without an escape value already; and an RLE scheme that allows coding the same run or copy in three different ways is over-engineered.

And that’s exactly why XCF is the most over-engineered format I’ve even seen. Among other things it has three ways to encode source offset with two bytes: signed X/Y offsets, signed 16-bit offset from the current position or an unsigned 16-bit offset from the start. And the videos come in either 320×200 or 320×240 size, so unless you have some weird wrap-around video you don’t need all those addressing modes (and actually no video I’ve tried had some of those modes used). Also since the data is not compressed further you can’t claim it improves compression. Speaking of which, I suspect that wasting additional bits on coding all those modes for every block in every frame negates any potential compression gains from specific modes. There are other decision of dubious usefulness there: implicit MV offsets (so short MVs are now in range -4,-4..11,11 for 8×8 blocks and -6,-6..9,9 for 4×4 sub-blocks), randomly chosen data sources for each mode, dedicated mode 37 is absolutely the same as mode 24 (fill plus masked update) and so on.

Of course there are more over-engineered codecs out there, I pointed at Indeo 4 as a good candidate in the comments and probably lots of lossless audio codecs qualify too. But my intent was to show what is really an over-engineered codec and why I consider XCF to be the worst offender among game video codecs.

As for under-engineered codecs, for the reasons stated above it’s not merely a simple codec, it’s a codec where a passerby can point out on a thing that can be improved without changing the rest of the codec. IMO the most fitting example is Sonic—an experimental lossy/lossless audio codec based on Bonk. Back in the day when we at libav discussed removing it, I actually tried evaluating it and ended with encoded files larger than the original. And I have strong suspicion that simply reverting coding method to the original Bonk or selecting some other sane method for residue coding would improve it greatly—there’s a reason why everybody uses Rice codes instead of Elias Gamma’. Another example would be MP3—there’s a rumour that FhG wanted it to be purely DCT-based (as AAC) but for patent holder’s consideration it had to keep QMF, making the resulting codec more complex but less effective.

P.S. The same principles are applicable to virtually everything, from e.g. multimedia containers and to the real world devices like cars or computers, but I’ll leave exploring those to the others.

The most overengineered game codec

September 7th, 2024

…as I’ve seen so far, of course, but the chances for it to keep this title are good.

I’m talking about XCF format (found in ACF files of Time Commando game developed by Adeline Software). For starters, the format allows various commands that are related to the engine (so XCF is used not merely for videos but also for the stage demo records containing e.g. commands to the camera and various other things.

But there’s nothing compared to the video codec design. The frame is split into three parts: fixed-size part with per-block 6-bit opcodes and two variable-length parts. I’d like to say that one contains colour values and another one stores masks and motion vectors, but in reality either kind of data may be read from either source and it’s not consistent even between modes (e.g. for four-colour pattern block both masks and colours are read from part 1 while for two-colour pattern block colours are read from part 2; it’s like they designed it on whatever registers were available in the decoding function at the moment). And for some reason they did not went further and left those frame parts uncompressed.

The real monstrosity though is the opcodes. As I’ve mentioned, they take six bits which means 64 different opcodes. And unlike many other codecs where you’d have half a dozen operations (copy from previous, fill with 1/2/16 colours and such) plus run count, here you have way more. Also it uses 8×8 block size which helps adding new modes. Here’s an abridged list:

  • raw block;
  • skip block;
  • motion block with one-byte motion vector (using nibbles for vector components), two-byte absolute offset, two-byte relative offset or two-byte motion vector;
  • motion block with each quarter using the same four modes;
  • single value fill block—and another mode for filling block quarters;
  • 2/4/8/16-colour pattern block;
  • 2/4/8-colour pattern block quarters;
  • special 4-colour pattern subblock coding mode where you can pick only one alternative option and only for half of the pixels;
  • block filled with mostly single colour except where mask tells to read a new value;
  • block with values coded as nibble-size differences to the base colour;
  • block coded as two base colours (high nibble only) plus low nibble of the current value;
  • block using RLE plus one of four scan modes (line scan, top-down scan, zigzag and inverted zigzag);
  • block using the same scan modes but sending only nibbles for the colour values (first nibble is for base colour high nibble, others are used to replace its low nibble).

And that’s not all, some modes (e.g. motion or fill) have refinement mode. E.g. opcodes 1-4 all mean skip block, but while opcode 1 means copy block from the same place in the previous frame and do nothing, opcode 2 means doing that and then replacing four pixels at arbitrary positions with new values, opcode 3 means the same but with eight pixels, and opcode 4 means using a mask telling which pixels should be replaced. It works the same way for other opcodes supporting it—first do the operation, then optionally update some of the pixels.

If you thought that peculiarities end there, you’d be wrong. Motion vectors are a displacement from the centre of the block instead of its top left corner as in any other video codec. And looks like while the format stored video dimensions, the decoder will work with any width as long as it’s 320 pixels.

I don’t envy future me who has to document it for The Wiki.

P.S. And what’s ironic is that the game was released between Little Big Adventure and its sequel, and while the former used custom format with FLI compression methods (and commands to play external audio), the latter switched to Smacker—a much simpler codec but apparently more effective.

P.P.S. After I reverse engineer a codec or two from games published by Psygnosis I should publish a new version of na_game_tool and switch to actually documenting all the formats it supports; and then do something completely different. At least the end is near.

Self-fulfilling excuses

August 31st, 2024

There’s this old but annoying (and sometimes infuriating) thing called self-fulfilling excuses. Essentially it’s when somebody uses a deferred reasoning “this thing will not work, because I’ll be stonewalling it until it fails, and then I can proudly claim that I was right”. Here I’ll give three examples of it.

First, there’s the tired “masks won’t work against spreading COVID-19”. From the facts I know even WHO finally admitted that the virus is airborne, so using a mask to protect yourself against it by filtering it out of the air you breathe sounds like a reasonable precaution (especially since virus size and masks filtering capabilities are known). And yet indeed masks were not an effective measure because of so many opinionated idiots who refused to not wear them. Just this year alone two random people shouted at me because they saw me wearing a mask (I don’t know who loves wearing them but I keep doing that in public confined places like shops or public transport because my health is not good enough as it is already).

Second, there’s this recent drama about Rust in Linux kernel. I can’t tell whether Rust really belongs there or if it was a mistake admitting it there in the first place—let the technical points be discussed by the experts—but the recent resignation of one of the maintainers demonstrated that the main reason why Rust code can’t work in Linux kernel because certain maintainers are against it. And you know what, if the rest of Rust kernel writers would leave and create their own kernel it may be for the best—I always supported having an alternative (this would be a good place to insert an advertisement for librempeg but Paul hasn’t provided one).

Third, there’s the situation with American help for Ukraine. Just yesterday russian plane sent bombs to my home city, killing civilians and destroying living houses. The most reasonable action would be to allow Ukraine use the provided missiles to strike those planes right at their airbases, eliminating or reducing the threat both directly and indirectly (as russians would have to operate the remaining planes from farther airbases, making their further strikes less effective). Instead USA forbids using its missiles for that purpose. Among the reasons quoted were that there are too few missiles USA can give anyway (right, nobody knows how to allocate scarce resources) and that there’s no strategic advantage to that (link provided as a proof that I’m not the one making it up). And considering that russians are taking such threat into account and have started building plane shelters on the airbases closer to Ukrainian borders, that will turn out true.

Apropos, this is an appropriate place to spotlight this saying:

This will probably keep being true for a long time.

If you’re looking for any solutions, you won’t find them here. This is an innate thing to many humans so I don’t believe it can be eliminated without cardinal changes to human nature (and we don’t know what and how to change), and while theoretically the consequences can be mitigated by the reasonable discussions and rational decisions, those are not common things either (and what I said above the human nature applies here as well). In other words, this world is imperfect and sucks a lot. I have more trite things to say but that’s enough for now.

SMS is a new AVI

August 28th, 2024

Since I’m still in a search of video formats for my na_game_tool, I happened to look at Jagged Alliance: Deadly Games and found an SMS file there (and then more for a demo version of Nemesis: The Wizardry Adventure).

The file starts with BLAH magic followed by two sizes, some table and audio data. And after audio data there’s a suspiciously familiar SMK2 signature. And indeed, the file from that point plays fine as Smacker video.

Of course this reminded me of a game Ripper, released in the same 1996 apparently (but by a different developer and publisher), which had AVI format that encapsulated Smacker of FLIC along with an audio track external to the format.

Why such combination needed to exist? I feel that the main actor in Ripper expressed the answer to that the best. After all, Smacker supports up to seven audio tracks and other games of that period played Smacker video and audio just fine; it should allow seeking even if nothing seems to have ever used that feature. Maybe it really needed more cowbell.

Looking at Nova Logic FMV formats

August 23rd, 2024

So there’s this company known mostly for helicopter simulators and those games use KDV format for animation. That alone is enough to make me look at the formats and what have I found?

  • the original Comanche game used a simple FMV based on PCX RLE. It was enough to just look at the file with a hex viewer to figure out all of it;
  • Armored Fist used KDV file with a different format which I still figured out after looking at it with a hex viewer and correcting my mistakes after image reconstruction somewhat started to work. The main lucky guess was that it uses 2-bit block modes for 4×4 blocks; then figuring out that block modes use 1/2/4/16-colour mode with patterns;
  • Werewolf vs. Comanche has almost the same format but for some reason Armored Fist reads only 3 colours for 4-color block (first colour is explicitly zero) so this one was more natural to RE first;
  • Comanche Gold tweaked format again, with the main change being that mode 3 means skip/fill/paint new block in the same way as the previous block, using a new pattern (for patterned modes) but keeping old colours. For this and the following format I actually had to look into the binary specification;
  • and finally Ukrainian planes simulators (F-16/MiG-29) use 24-bit version of the format above (adding cached colours, more about it below).

The main principle remained the same for all but the first KDV format: split picture into 4×4 blocks, use two-bit mode for each block, pack four modes into one byte followed by block mode data, modes being skip, fill, 2-colour pattern mode, 4-colour pattern mode and raw, colour 0 being used for the pixels that should be left unchanged (or to signal the special mode extension). The details changed though: in first two flavours raw mode was signalled by “fill with 0” mode, in last two flavours it’s “2-colour pattern with the same/zero colours” signals raw block mode and “fill with 0” means skip. An interesting feature of 24-bit format is that it introduced colour cache: if the first colour byte has top bit set, it is really an index in the cache (low 7 bits of that byte are), otherwise it is the first byte of 24-bit colour value which should be also added to the cache (it is organised as a circular buffer, so 128th value replaces 0th, 129th value replaces 1st and so on). And it still uses 0,0,0 as “leave unchanged” value (or a special flag for raw mode when 2-colour pattern uses two zero triplets).

I’ve added support for them all to na_game_tool (a couple more formats and I can make a 0.2 release before switching to a completely different activity) and I want to say some words about format detection.

As mentioned above, the main problem with the format is distinguishing the flavours. It’s very simple for the last two—Comanche Gold uses larger header and 24-bit format uses larger header, lacks palette chunks and stores frame data in K24i chunks (instead of KDVi chunks as the rest). The problem is to distinguish the first two formats without resorting to an ugly hack by trying to decode it both ways and seeing which one sticks. That’s where a new feature of na_game_tool comes into play—detecting format by regular expression. Since I know that most Werewolf vs. Comanche videos start with prefix “w_” I could simply add a rule for it, just like this:

    DetectConditions {
        demux_name: "kdv-wc",
        extensions: ".kdv",
        regex:      "bumper.kdv,w_*.kdv",
        conditions: &[CheckItem{offs: 0, cond: &CC::Str(b"xVDK")},
                      CheckItem{offs: 4, cond: &CC::Eq(Arg::U32LE(0x14))}]
    },

So files with this extension, regex and magic will be detected as this flavour of KDV while files not matching it (but with the same extension and header) will be classified as files from Armored Fist. It’s not something useful for a general multimedia converter but it’s useful for the game files—another case is CRH format that has two flavours for two different games (differing by the hardcoded height and number of bits for image offset) where you also have to rely on the file names to distinguish which one is which.

Of course I have some other changes for the upcoming na_game_tool release (maybe in September, provided I can find enough formats to add by that time) but they’re equally unexciting. Keep staying out of the loop.

On some German town names

August 19th, 2024

One of my hobbies is travelling around and during my travels I see names that I find amusing like Geilhausen which can be translated as “Gay/kewl housing” or Waldfischbach (which translates to “forest fish brook” and it makes me think about what special kinds of fishes live in the forest). But today I want to talk about more IT-related town names.

Travelling from here to Switzerland first you encounter Vimbuch (“das Buch” is “book” in German and the town is named after St. Find. Really!). The only problem is that they had to wait over seven centuries in order for people to understand that it’s not named after an editor manual.

In twenty kilometres from it you can find Urloffen (“offen” is “open” in German). One can only wonder what its inhabitants did for over eight centuries before hyperlinks were invented.

Another thirty kilometres to the south you can find Rust.

And if you continue following the road you may end in Switzerland and see Speicherstraße in canton St. Gallen (“der Speicher” is “storage” e.g. HDD in German; the road is not impressive at all even compared to the old PATA buses). But let’s not go there, it’s a neutral place. After all, there’s Speicher near Bitburg and you know it’s a very safe storage if they build a castle for a single bit.

But speaking of the programming languages, there’s a place called Perl in the remotest rural place of Saarland (a federal land often regarded to be exactly the remotest rural piece of Germany in the eyes of people not living there). The fun thing is not only the fact that it borders Schengen Area (yes, that town is right across the river) but also that the rail line continues to Apache in France (since French famously suck at spelling, they forgot a letter in the web server’s name).

And that’s all names I can remember immediately not counting the mountain range Eifel but who remembers a programming language named after it anyway?

Twenty years

August 14th, 2024

On August 14th 2004 Mike Melanson committed to FFmpeg a decoder for TSCC codec that I sent to him. So on that day my official story of contributing to open-source multimedia has started.

A lot has changed since then and not always to the best. Back then there were a plenty of formats to reverse engineer (with quite a demand for them as well), nowadays it feels like there’s one standard video codec (or two, if you see AV1 as different enough) and Fraunh*fer Alternative-to-Opus Audio Codec (or ***-AAC for short). And the complexity increased so that it’s increasingly harder to implement a codec alone.

From the software side things have changed as well, I’ve ranted about it enough. To put it simply, I’ve not been participating in FFmpeg (or libav) development for the last decade and I don’t regret it at all.

And yet I still have the curiosity about various multimedia formats so I keep investigating them and sharing the results with the public. I’m not sure I’ll find enough material to last for another twenty years but there are still enough obscure games out there. Occasionally my work comes in handy for somebody else and it makes me happy (not as happy as with my first contribution, that happiness lasted for a week—but I was much younger and very inexperienced back then). That’s both the curse and the blessing of being an expert on a very narrow topic: hardly anybody needs you but when they do they have no other option. So I’ll keep providing what I can as long as possible and let’s see how it ends…

Watery format revisited

August 10th, 2024

I’m still picking unfinished and half-finished format for na_game_tool and there were some from Access games. While supporting animations from Amazon: Guardians of Eden is rather pointless (since they simply update pixels on some scene and do not have own palette), supporting H2O and PTF formats is more interesting.

I wrote about them back in the day and about H2O specifically. And since it was a quick look and my eyes are not that good, I missed some details. But as I finally implemented a decoder for it, now I can correct that story.

As it turned out, H2O stores video data as Huffman-coded 16-bit values instead of usual bytes. And those values are used for both opcodes and colours. If top four bits of opcode are 0-3 then the top byte codes number of 4×4 blocks to process and low byte is fill value (0 means skip block instead). If top four bits of opcode are 4, then low 12 bits are number of blocks to process minus one and each block data consists of 16-bit pattern and two colours (packed as another 16-bit value). Top four bits of opcode being 5 mean partial update for the specified number of blocks, each block having 16-bit update mask and 0-16 colours to update (packed in 16-bit words).

So while the format is no ground-breaker, it still turned out to be more interesting than I expected it to be.

P.S. I guess I’ll do a bunch of FLI-based formats next. There are some curious variations of it like FLI with audio or RNC ProPack compression.

Implementing Jack Orlando animation support

August 5th, 2024

As I’m still working on my game tool, I decided to implement video format from the Jack Orlando game. So now na_game_tool has AVX support and I’d like to say something about the format.

The format is a rather weird mix of different things. It starts with a fixed-size audio block and that size depends on whether it’s 8-bit PCM or IMA ADPCM (with stereo samples packed into interleaved 32-bit words), it seems to have block structure where an arbitrary chunk of data is stored as 32-bit size plus block data which includes video data with accidental headers. Yes, the file internally may have several FLX animations with different framerates too, so e.g. a 16-fps piece of animation may be followed by 10- or 24-fps animation. And audio stream is global even if it’s stored in small chunks inside FLX (since you have several seconds of pre-roll audio in the beginning). Oh, and palette data may be stored as RGB565 or RGB24 (which you’re supposed to understand from palette chunk size). Video compression is simple but somewhat weird too, as RLE is used to optionally compress data before it gets decoded.

In either case weird does not mean bad and in my line of hobby an interesting codec is worth more than a popular one.

P.S. And there’s even weirder format known as Knowledge Adventure MOV. I gave up on REing it but apparently video there is split into tiles (4×2 or 2×1 pixels apparently) and frame data is those tile indices that may be stored uncompressed, RLE compressed or, apparently, using an arithmetic coder with static frequencies stored in the file header. The difficulty comes from self-modifying code and not the cute kind that simply modifies couple of instruction constant arguments but one that substitutes function call address with a pointer to a special memory buffer allocated in a context and filled elsewhere. In theory it can be figured out without much hassle with, say, a built-in DosBox debugger but I’m not inclined to do that (at least not now).

Re-revisiting Actimagine VX

August 2nd, 2024

As Paul said, I probably should do something more useful (and he should do something better than still messing with jbmpeg fork) but we’re all better at giving good advises than at following them.

Recently I was contacted by a guy who asked for my help on dealing with VX format as he needed video part extracted from it and there were no tools available. And he also provided some samples that helped clarify things I got previously wrong, for example the seek table and framerate field as well as stereo audio support.

Also since I bothered to extract video part (the only way to determine its length is to parse it, so now I have a stripped-down version of video decoder doing that) I decided to finally do something about the audio part. So after fixing some things (e.g. missing filter reconstruction or stereo support) now my decoder can decode VX files with recognizable audio (not perfect but who can do better is encouraged to do so). The sad thing is that I could not use my old setup (a console emulator with GDB stub) for debugging so I’m not likely to ever improve it.

At least it was a fun diversion and helped to improve the format knowledge a bit.

P.S. Meanwhile I’m still working on adding new formats for na_game_tool, hopefully I’ll have something to write about another game format soon.