Call for a new container format!

January 17th, 2025

Sometimes I remember that Matroska exists and today I also remembered how it came to existence. Its author proudly admits mixing all the buzzwords of Web 2.0 era like XML, Semantic Web etc etc and coming up with that format. Since we’re in Web 3.0 era we should have something more modern.

That is why I’m calling for a modern multimedia container format to supplant the outdated formats of the old. It should encompass all the features that make the modern Web great:

  • binary JSON as the backbone of the format;
  • central repository for the downloadable descriptions of the parts of the format (but not codecs themselves! Think of it as of MXF specification if it helps);
  • blockchain (as well as clusterchain and framechain);
  • metaverse integration;
  • decentralised storage (so that the container may refer to some data in the cloud as well as on the local disk; even MOV could do something like this);
  • and of course AI!

Some of you may ask where AI can be applied in this scenario. The answer is obvious—transforming input data for better compression (let alone generating metadata or enabling better integration with other Web 3.0 products). A good model should be able to do the same savings like Matroska did by e.g. shaving off common headers bytes from each frame but without special mapping. An excellent model may generate the content by the embedded description instead of transmitting AV2 video. And of course the central repository will contain the description of models and parameters to be used (in addition to the descriptions of better representation of container parts layout). The possibilities are limitless!

Proposals should be sent to the Alliance for Open Media , I have worse things to deal with.

Another fun bit about image formats

January 11th, 2025

Since I have nothing better to do (or the pending stuff makes me thinks I have nothing better to do), I search for a game to look at (preferably with a custom video format with an extension starting with ‘o’, ‘w’ or ‘y’ but any undocumented custom video format would do).

One of those games was Blood & Lace (despite the name it’s not about somebody cutting fingers while trying to put shoes on; whatever—I mostly study games, not play them). It had some videos using Indeo 4 or 5 plus some animations apparently on the game engine. Those were compressed as well as all TGA files. So out of curiosity I looked at the binary specification and apparently most of the files are compressed with LZH (the old compression method with dynamic Huffman coding used in LHA) using custom header starting with JFX1. And TGA files use JGF5 header which is almost the same but has image width and height in it before LZH-compressed data.

That’s another one in my collection of common LZ77-based methods used in multimedia compression, I wonder what will be next.

A quick review of LZ77 matching techniques

December 31st, 2024

You can blame Paul for interesting me enough to spend some time on research. He discussed how he implemented deflate support for librempeg and I got curious myself how other fast LZ77-based encoders do the matching.

The traditional approach is to have a hash chain (i.e. associate each dictionary position with a hash, usually calculated from three symbols, pointing to the next position with the same hash). It’s simple and works surprisingly well. Almost all compressors I looked at used either this approach or a hash map.

The only exception I saw was GLZA which described itself as counting symbol frequencies first and then building partial suffix tree just for the most common symbols.

And finally I encountered the paper by Sadakane and Imai from 2000 where the authors proposed two improvements: introducing the secondary hash for the collisions in the hash chain or, alternatively, using suffix sorting (in a suffix array). They report that using two-level hash improves performance significantly, especially for large dictionary sizes but it’s negligible for 64kB or less, while suffix sorting time seems to grow logarithmic from the dictionary size but it beats one-level hash at dictionary sizes of 256kB and two-level hash at dictionary sizes above 1MB.

So, apparently, this is yet another case when keeping it simple is advantageous. Compressors with larger dictionaries will benefit from a different approach, while deflate has maximum dictionary size at 32kB and anything but brute force search will be about equally good.

In either case, I don’t regret spending time on this quick research.

On the sorry state of opensource multimedia

December 25th, 2024

I’ve been wanting to write this post for a long time, with a focus on the difference between hobby project and product and about NihAV only. But a recent FFdrama made me re-think both the structure and the conclusions.

Apparently there’s another surge of developer’s discontent in jbmpeg for having mushroom treatment (not for the first time and probably not for the last one). IMO they need to realize the project is as free and democratic as Soviet Union and you need simply to agree to the things proposed by the General Secretary (definitely not the leader)—that would save time and nerves of everybody involved. As I wrote countless times before, I do not fear for the future of that project as it can keep such existence indefinitely long and here I’ll try to present my reasons why.

First of all, revolution a la libav won’t work—Michael has learned the lesson and he won’t be kicked out again (not that it really worked in 2011 but now there are no chances for that).

Second, if you split out and form an alternative, it has not so many chances to replace it. And if you decide to write anything from scratch your chances are next to zero. The rest of this post is dedicated to answering why.

Recently I re-read The Mythical Man-Month which tells not only about the author’s experience designing RedHat OS/360 but also presents more general observations and ideas. And right in the beginning he talks about the difference between a program, a programming product, and a programming systems product. Essentially, a program something a programmer writes and that it works for him on his system; a programming product is a program with documentation and support; and a programming system product is that works as a component in a larger system. And moving from one stage to another requires an effort several times larger than the previous one (I’m simplifying a lot and probably misremember something—so you’d better read the original book, it’s a worthy reading anyway).

Here we have a similar situation: writing a tool just to do things for you is straightforward, even I have managed to do it with NihAV; making it into a product requires offering much wider support for different platform configurations (for example, my videoplayer has VA-API hardware decoding enabled by default while it’s not available, say, on Windows and you need to switch that feature off there in order to build it) and different features (e.g. nihav-encoder works for testing encoding per se, but lacks ability to encode input into a good intermediate format supported by other players and encoders). And it gets even worse if you try to make it into a library ready to be used by others—beside the usual things like documentation you’re expected to guarantee some API stability and a certain level of quality. So while I may not care that my app panics/crashes in certain circumstances, it’s significantly less forgivable for a library. And of course achieving such quality level requires a lot of unexciting work on small details. Debugging is even worse.

Suppose you decided to create a fork and work from that. Then you still have a much worse position—you may have the same codebase but there are no killer features you can offer and you don’t have recognition. libav managed to succeed for a while since it was supported by some distribution maintainers—and even then users complained because the de facto brand name was replaced with some unknown thing. And I guesstimate 40% of current jbmpeg developers contribute to it in order to upstream the changes they make while using it in their employer’s product or pipeline. So how can you convince those companies to use your fork instead? And that’s not taking patent situation into account which makes a substantial support from any large company for your project rather improbable.

Good thing I’ve never intended NihAV to be a competition, but what about other projects? rust-av died because of lack of interest (Luca claims that he started it mostly to learn Rust and see how performant it can get—mission accomplished, no further development required). librempeg fares better but I doubt that Paul wants to deal with all the demands that other parties make for the honour of your stuff being included into their distribution (or being used without even a credit).

Another thing that needs to mentioned is that multimedia is no longer an attractive field. Back when I started to dabble in the field, it was rather exciting: there were many different formats around—in active use as well, and people wanted to play them not only with the proprietary players. There were libraries and players supporting only a specific subset of different formats, like avifile or libquicktime or DVD-only player. Nowadays it’s usually a combination of H.26x+AAC in MP4 or VP9/AV1+Opus in WebMKV, all formats having specifications (unless you lack Swiss Francs to pay for the ones from ISO) and new formats are not introduced that often either. Of course, we might have H.267 standardised soon but who uses even H.266? When was the last time you heard AV2 development news? The codec was supposed to be released a couple of years ago, did I miss it along with AV3? Do you remember Ghost audio codec from Xiph? Of course Fraunhofer will keep extending AAC patent lifetime by inventing new formats and naming them like OMGWTFBBQ-AAC but who really cares?

That is why I believe that no matter how dysfunctional jbmpeg is, it will keep existing in this undead state indefinitely long as it’s good enough for the most users and there’s no compelling reason (e.g. new popular formats or radically different ways to process data) to switch to anything else. The only winning move is not to play.

To NIH or not to NIH

December 22nd, 2024

Paul of librempeg fame informs me about his achievements occasionally (and in my turn I try to remind the world time from time that this project exists and may provide you functionality hardly found elsewhere, like various filters or console formats support). And his recent work was implementing inflate routine specifically for the multimedia needs. This made me think if it makes sense to have a custom implementation for deflate decompressor and packer in multimedia project when zlib exists and I think it makes perfect sense. Of course in NihAV I NIHed it because the project concept demands it and it was a nice exercise, but in more serious projects it makes sense too and below I’ll try to explain the reasons.

Zeroth of all, a quick reminder about flexibility. RFC 1951 merely specifies the format, so the implementations can output varying bitstreams that will correctly decompress the data. Back in the day when I worked on it I mentioned how you can compress data in different ways, dedicating more time to achieve better compression. And there are more tricks not mentioned there like parallel compression.

Now, the first reason why to have your own implementation is that you can adapt it to your custom needs. As Paul demonstrated, inflating data into a custom format might be beneficial. If all you need in 99% cases is unpacking data from one buffer into another or to a frame buffer (which has padding so you have to output, say, 27 bytes, then skip 5 bytes, output 27 bytes and so on) you can do without all additional functions and not bother with a sequence of calls to partially inflate data.

Then, there’s a question of consistency. You cannot be sure that a new version of zlib will produce the same output as its previous version (I vaguely remember that there was a small scandal when GitHub releases were re-generated using different deflate implementation and resulted in a lot of headache because of old archive hashes being no longer valid; or there’s a story of some Linux distros replacing zlib with zlib-ng resulting in failed tests; and even “no compression” format may change apparently). The case with liblzma is probably a good demonstration of other reasons why it’s not always wise to rely on third-party components.

And finally, you can not merely adapt interface to your needs, you can tune it to handle your data better too. There’s a reason why there exist compressors targeting e.g. genome data. So when you compress image data, it may be beneficial to search for matches around the position right above the current line first, and the presets for the compression levels may be tuned to the different sets of trade-offs. After all, deflate is often used in screen capture codecs where real-time performance is more important than the compression ratio. But who can imagine people tinkering with the encoder trying to improve its performance in a multimedia project?

I hope this helped to convince you that there are circumstances where NIHing something may prove worthy. As a rule of thumb, if it’s easier to implement yourself than to re-use an existing library then maybe you should do so. That it the right way, and the other way lies left-pad.

Looking at some more exotic animation formats

December 19th, 2024

I keep looking for old games with interesting animation formats and I’ve found another two, from Israeli companies this time.

First is the format used in Armed & Delirious. Since all of the files are in the custom archive (except for a single file with “please change CD” animation) I’m not going to support it, though I still documented it for posterity. Apparently it has two operating modes: updating some lines where each line can have its own coding mode (all are RLE variations) and updating tiles, now with each tile having its own coding mode. The only interesting thing is that tile coding operates on a sub-set of colours so it can use 3-6 bits for index. I remember some formats resorting to 16 colours and 4-bit indices but there are more variations here.

And then there’s VID format used in Soul Hunt and, apparently, some other games by the same company. It features DPCM-compressed audio and depending on version RLE- or JPEG-compressed images (and both versions are present in the game). That’s right, they use code (probably borrowed from libjpeg but I’m not sure about that) to decode data in JPEG-like format (8×8 blocks, IDCT and such) and then use some large lookup table to convert it from YUV to paletted RGB. Of course there are video codecs with such built-in functionality like Cinepak or Indeo 3 but it’s still surprising to see such thing in a random codec.

Update from December 21st: I’ve looked at JPEG-inspired coding again and I have a better understanding what it does (so maybe I’ll even be able to support it eventually):

  • there’s intra- and inter-coding, the former operates on 256×16 megablocks (consisting of 32×2 Y blocks and 16 U and V blocks each), the latter has a macroblock update mask and codes only updated macroblocks;
  • there are two quantisation matrices sets used in intra and inter mode, the binary specifications calls them “good” and “bad” factors;
  • the colour conversion code and IDCT seem to come from libjpeg while coefficient coding is their own invention;
  • coefficient coding seems to employ a rather simple set of codes: xxx0 is used for -4..4 values (codes are LSB first), xxxxxx01 is used for larger values, xxxxxx11 is used to code zero runs and xxxxxxxx 11111111 is used to code large values;
  • DC prediction is present and works the following way: decode block, dequantise it, add last DC value (and update it afterwards), scale block DC value by 32.

Initially I was discouraged by the coefficient decoding routines being implemented as state machines with computed label, so Ghidra cannot decompile it properly. I.e. it starts with “refill bit buffer” state, then moves to “decode new coefficient having 32 bits”, then it handles one of four cases listed above and moves to “decode new coefficient having 28 bits”, or “decode new coefficient having 24 bits”, or “decode new coefficient having 16 bits” state—unless it decoded the full block and saves that state to keep working from it the next time the function is called. Oh well, at least it turned out to be not that much complicated in the end.

Update from December 27th: I dug a bit more and while decoding macroblock data looks feasible, reconstructing it properly is bonkers, as the game reads quantisation matrices and scale factor for good/bad case from quants.ini text file. So I’m not so sure it’s worth the effort…

Some words about Alien Isolation CDA2

December 11th, 2024

Some of you might’ve heard about a Finnish adventure game called Alien Incident. Apparently it had a CD release with intro and final cutscenes re-done as an animation format. Of course that’s enough to draw my attention.

It turned out that it has several features making it one of the most advanced game formats: multiple soft subtitles and checksums for each frame. The file header is followed by the text block with subtitles in several languages that are rendered by the player over the video (and the font itself is RLE-compressed). Then there’s a table of 32-bit checksums for each frame, and only after that there’s frame size table followed by actual frame data.

Video is coded in 32×40 tiles with one of several coding modes: read full tile, update marked pixels in the tile, update 2×2 blocks in the tile, or update 4×4 blocks in the tile. That data is further compressed with slightly modified LZSS. Additionally frames carry audio data and some additional commands such as fading or setting frame duration. By default player operates on 70 or 75Hz clock and each frame is displayed for several ticks (usually 5-8). So a special command in the frame data tells player to display each frame for N ticks until further notice.

And now it is time for a rant. There is an issue uncovered by decoding these files. Both files are long (finale is over 11 minutes at 18.75 fps) and have lots of palette changes (because of both scene changes and fading)—and those two things helped uncover a problem with FFmpeg. Its AVI demuxer is the problem: it scans index, finds palette change chunks (which I explicitly mark with AVIIF_NO_TIME flag), adds them to the video stream statistics (despite the flag), concludes that video stream has significantly more frames than audio stream and switches to non-interleaved mode; in that mode it disregards actual index contents and treats palette change chunks as video data. Personally I don’t care because I have my own set of tools for decoding, transcoding or playing videos that does not have this problem, but considering that virtually every other piece of software uses libavformat for handling input data, that may pose a problem for everybody else (I can afford to not care but somebody else would have to change perfectly valid output just to work around third-party bug). This is a fun case demonstrating that monopoly is evil, even if it’s a monopoly of open-source software.

P.S. Probably it’s a good occasion to remind that librempeg exists and probably both it and you can benefit from your attention to it.

Some more game formats

December 8th, 2024

In my recent post about na_game_tool 0.2.5 release I complained that I still miss formats starting with letters B, O, W, Y and Z. Well, after an extensive search I’ve managed to find two formats to cover some missing letters.

First is the animation format used in some DOS version of the game Baldies (it’s an RTS game released on consoles as well as on Windows and DOS—and there was even an early Amiga demo apparently). CD with DOS/Windows version employs ordinary FLIC, (floppy?) DOS version and demos used its own animation format instead. It’s rather ordinary RLE anyway in a BAL(d) file as the rest of game resources.

The second game, Los Justicieros (or Zorton Brothers, which is a much better title for my purposes) is much more interesting game. It has a format with various compression methods as well as audio chunks and palette manipulation commands (like fade to black or fade to a new palette). And the compression methods are a mix of RLE and tile-based VQ (in both senses). One method decodes picture as RLE, another one uses RLE on updates to the previous frame, third method either codes whole frame in 4×2 two-colour tiles or codes an update mask first and draws only updated tiles. The fourth method uses RLE on updates as one of the previously described methods, but instead of pixels it codes two-colour 4×4 tiles. Another fun thing is that there are (at least in the version I was able to locate) five files in this format: four of those code one still image and the fifth one codes over half an hour of all video scenes.

In either case it was a very curious format—exactly the reason why I like to examine those old games.

Some words about AVSS format

December 5th, 2024

Back in the day I had a cursory glance at AVS. Out of curiosity I decided to add AVSS format support to NihAV. Despite it looking like an obscure format nobody has ever heard of, it’s well-documented. Under the name of Intel DVI (you know, Digital Video Interactive—mostly remembered for DVI ADPCM under the name IMA ADPCM).

Anyway, I’ve managed to locate just one sample containing single RTV2.1 stream so I was curious how it fares.

Well, it turned out to be the standard Indeo 2 format but for some reason actual image being down-sampled compared to what the headers claim (160×100 instead of 320×200). Nothing much to say but it was still curious to see that system from the past.

na_game_tool 0.2.5 released

November 30th, 2024

I know that I released version 0.2.0 merely two months ago while saying it’d probably take me a year for a next release. Well, that’s still true but I wanted to make public some quality-of-life improvements as well as move some formats from NihAV (mostly Actimagine VX) so those who care have a simple way of decoding them. That is why it is a micro release.

The main reasons for release are improved AVI writer and VX decoder. The problem with my AVI writer is that it writes raw video and audio and those tend to become large—so I introduced OpenDML support in 0.2.0. But as I didn’t thought it through, I simply enabled it for videos with large frames. Now it works automatically if the file gets large enough so you should not hit size limit unless the total file size will approach 4GB.

But even if it’s a micro release I could not release it with just those two things so it introduces support for two formats reverse engineered by me recently, two formats reverse engineered by me long time ago (both are called Paco), two formats also imported from NihAV and two formats implemented after the description or code produced by others.

As always, grab it from the dedicated page if you really care.

P.S. I’ll probably keep doing what I’ve been doing so far (polishing my code, reverse engineering random formats, moving some more format from NihAV to na_game_tool—in other words, nothing to show). Meanwhile there’s still more new and exciting stuff at Librempeg.

P.P.S. Two formats were added for a rather non-technical reason: I want to have in my tool decoding support for formats for each letter of the alphabet. So far letters B, O, W, Y and Z are missing. That is why I intend to import Beam Software formats support from NihAV, maybe implement Psygnosis YOP and some Westwood format if I can’t find anything to RE instead (and I lost my chance to name PMV and RLF after Zork games). In either case 0.3.0 will still probably take half a year to a year to get enough supported formats.