Author Archive

FFpropaganda

Saturday, November 1st, 2025

Recently this tweet was brought to my attention (by somebody else who saw it and found hilarious). While I agree with the message (I also find those OMGsecurity people annoying and rather counter-productive), the premise is bullshit, namely those two lines:

Arguably the most brilliant engineer in FFmpeg left because of this. He reverse engineered dozens of codecs by hand as a volunteer.

So I’ll try to explain what’s wrong with that claim and FFmpeg twitter account (or FFaccount for short) in general.
(more…)

IFS Fractal codec

Friday, October 31st, 2025

As I mentioned before, apparently this format got popular enough to be licensed and used in three different container formats for three different goals (generic VfW codec, game video codec and interactive video player). Another curious thing is that the codec operates in GBR420 colourspace (yes, that means full-resolution green plane and down-scaled red and blue planes—something between Bayer and YUV420). Oh, and of course the fact that it uses true fractal compression makes it even more interesting.

Even if the codec operates on full 8-bit values internally, the reference decoder outputs either 16-bit RGB or paletted image (new palette is transmitted for some frames, usually intra ones). And it’s worth mentioning that the decoder always works on 320×200 frames (probably for simplicity), IFS image format does not have that limitation.

Internally the codec operates on 4×4 blocks grouped into 8×8 super-block so that some operations may be performed on whole 8×8 blocks instead. Planes are coded as green first and red plus blue plane next to each other second, both parts being coded independently (i.e. codec format always codes source block offsets related to the beginning of the current plane and switches them mid-decoding). Overall, decoding is straightforward: read frame header data, start decoding block opcodes for green, continue with block opcodes for red and blue, do some operations depending on header flags, output new frame.

There are several known header flags:

  • 8—repeat last frame;
  • 4—swap frame buffers (and output previous frame after decoding into the new current frame);
  • 2—palette update is present (the first header field is an offset to it);
  • 1—this is a fractal (key)frame, it should be decoded 16 times.

Yes, it’s the hallmark of the true fractal compression: it does not matter from which source you start (here it’s planes filled with 0xAB value in the very beginning), after 7-10 iterations you’ll converge to the desired image (sounds perfect for error resilience, doesn’t it?). But since it’s a computation-costly process, inter frames merely do some kind of refinement (including motion compensation).

Block opcodes are several bit fields packed into bytes LSB first. First three bits are main operation ID, seven being a signal for an extended operation with an additional 4-bit operation type. Here’s a table with them all (unlisted extended opcodes do not exist and make the reference decoder crash):

  • 0-3—affine transform block
  • 4—skip next 1-32 blocks;
  • 5—motion compensation for the next 1-256 blocks;
  • 6—raw block data follows;
  • extended 0-7—affine transform for 8×8 block with an optional refinement for one of 4×4 blocks (in that case another 3-bit opcode is read and applied; the meanings are the same except that skip and motion compensation should apply only to one block and extended opcodes are not allowed);
  • extended 12—skip 33 or more blocks;
  • extended 15—raw 2×2 block put at absolute 16-bit offset. I doubt it’s ever been used.

Motion compensation is performed by copying block from the previous frame using one of up to 16 possible motion offsets transmitted in the frame header. This approach was not unusual back in the day (Indeo 3 immediately comes to mind).

And now for the juiciest part, namely affine transforms. Fractal coding works by finding a larger block (called domain block) which (when down-scaled, rotated/flipped and brightness/contrast adjusted) will correspond to the current one. Here domain blocks are always twice as large (with down-scaling performed as taking each even pixel at every even line) and are located at even positions (so 14-bit index is enough for them). Contrast adjustment is done as pix*¾+bias, with bias being in -64..63 range (so 7-bit index is enough for it). The transforms itself are described by their bit masks: bit 0 means source block should be mirrored (i.e. left becomes right and vice versa), bit 1 means it should be flipped (top becomes bottom and vice versa) and bit 2 means it should be transposed (i.e. pixel (1,2) becomes pixel (2,1) and such; this operation is for 8×8 blocks only).

That’s all! It was good enough to compress video with 4-10x ratio (or twice as much if you treat it as 16-bit video instead of paletted one) without the usual blocky artefacts present in other codecs. And it applied inter-frame compression in order to save both space and decoding time. While that part was not a proper fractal compression, affine transforms were still used there (it reminds me somewhat of certain French game formats that combined motion compensation with mirroring or rotating).

The sad thing is, this is probably the only true fractal video codec in existence. Writing a fractal image compressor is so simple everybody can do it as an experiment, making a proper video codec is apparently not. Even ClearVideo while being from the same company and believed to be a fractal codec is not one in reality—key-frames are coded a lot like JPEG and the only thing common with fractals is using quadtrees, copying blocks from elsewhere, and adjusting brightness when copying blocks. If not for the company name one would not think about it as having anything to do with fractals.

And as a bit of philosophy, it looks like this codec was adopted from the original fractal image encoder (as I said before, this kind of compression looks like the first straightforward scheme for fractal compression as it’s usually described in the literature) and adopted it to video by adding skip blocks and motion compensation. Then they probably started experimenting with better fractal compression—using blocks of different size and quadtree to describe that, better compression of block parameters. Then at some stage they discovered that it was much easier and faster code DCT blocks for key-frames and plain motion compensation is good enough—that’s how they ended up with ClearVideo. And then they discovered that their newer products can’t compete with other codecs and the company went out of business.

To credit where credit is due, I must say that the company got one thing right: the majority of the future video compression was searching extensively for the matching blocks, so if they started it a bit later and applied their know-how, they could’ve ended with a very competitive video encoder by now.

Why wavelets is a bad choice for image coding

Thursday, October 30th, 2025

I’ve been teasing it for too long and finally the time for this post has come. I really have a reason to believe that wavelets are not the best choice for image compression, so here are my arguments.
(more…)

The most horrifying town in Germany

Sunday, October 26th, 2025

I never had a reason to care about Halloween, and nowadays real world news are more scary than any imagined horrors. Yet I remembered one of my old trips so why not mention that curious fact.

There’s a town somewhere between Frankfurt and Frankfurt-lowcost airports named Oppenheim. People coming from there are obviously known as Oppenheimer, as well as their descendants and people marrying their descendants like the famous Lillian Oppenheimer (I know her as the person who popularised origami but apparently she’s a mother of several well-known mathematicians as well) and some guy claiming to become death, destroyer of the worlds.

But the town itself is more sinister, even if you disregard its catacombs. There’s Frankensteiner Hof—or residence of Frankenstein (maybe the descendant moved to Bavaria and got famous there). As for real monsters, just around the corner from the previous landmark they have a street named simply Zuckerberg—no suffixes for street or alley at all, just Zuckerberg.

It’s much better on the other side of Hessen—in Bavarian Miltenberg they have plainly named Hauptstraße (meaning simply “main street”) and parallel to it runs Mainstraße, specially for the foreigners who don’t understand German.

Slow NihAV week

Saturday, October 25th, 2025

I don’t have much energy to work on stuff, so I spent most of my time doing nothing—and occasionally working on fractal formats decoder.

But first of all I’d like to tell that I’ve added another format and a half to na_game_tool. I’m talking about the full-screen animations in Hopkins FBI game. The format by itself is very simple: 640×480 frames, first one is stored uncompressed, the rest use extremely simple compression (values 0x00-0xD2 are pixel values, opcodes 0xD3-0xDD are used to code runs, opcode 0xFC signals end of data, the rest are used to code skips). Why I called it a format and a half? Apparently what I described is true for .anm files that are used to code FMV cutscenes, but there are also .seq files that have the same structure but no opcodes for runs (those are normal pixel values there). Curiously, demo version of the game had ANM.EXE which can also both decode and encode .anm files and has helpful messages too (if you know French).

Anyway, back to the fractal compression. I’m still working out wrinkles in TMM-Frac decoder but it gives a recognizable picture more often than not. Fun thing is that back in the day Alyssa Milburn decompiled the same decoder in FVF (for ScummVM engine) and the video decoding is the same, only container is different. Unfortunately it is a decompile so it reconstructs the original code in the binary specification in C form with minimal improvements (see for yourself). Mind you, it’s a great accomplishment by itself, considering how the code in question is tricky even for modern decompilers (mostly because it uses segment registers to access different kinds of data and similar tricks). But since I care more about understanding how it works than having a working decoder, I’m fine with having a buggy implementation that I can fix eventually.

Here’s a sampler of quickly hacked FVC1 decoder (frame 70 from fernlogo.avi if anybody cares) made by copying my current TMM-Frac decoder core. As you can see, there’s still a lot to fix but there is some progress there too. Mostly it serves as a proof that it’s the same technology in all three cases (I’m yet to write an FVF decoder but it’s undoubtedly the same compression).

Of course when I finish it I’ll try to write a nice format description as it is a bit more complex than “apply affine transformation and contrast adjustment to that block” of pure fractal coding.

Meanwhile Paul has shared a byteVC2 decoder with me and I have to look at that codec eventually (big spoiler: it looks like H.266 rip-off considering how the binary specification mentions ALF, SAO, WPP and such). So many things to procrastinate looking at!

A pair^Wtrio of exotic formats

Tuesday, October 14th, 2025

If it looks that I’m not doing anything, that’s about right. Nevertheless I’d like to discuss two exotic formats that I’d like to write decoders for.

The first one is unlike most of the video codecs I’ve seen so far. For starters, it uses fractal compression. Not surprising since it comes from Iterated Systems. And unlike later ClearVideo, it is really a fractal codec. From what I see, it works exactly like the textbook example of the fractal compression: split video into small fixed-size blocks, search for a domain block, apply simple affine transform on scaled-down version of it plus brightness scaling and output the result. There are additional possible operations like leaving blocks unchanged or reading raw data for a block. Since this works only for the greyscale, frame is stored in YUV420 format, planes coded sequentially. Unfortunately since the binary specification is mixed 16/32-bit VfW driver that Ghidra can’t decompile properly, the work on it goes at glacial speed.

The other codec is like the previous one but it has its own container format and DOS player. It comes from TMM—not The Multimedia Mike but rather the company known for RLE-based PH Video format. I don’t see mentions of Iterated Systems in the binary specification, but considering how similar this FRAC codec is to theirs (it uses the same bitstream format with the same opcode meanings and the same assembly instructions) I expect they’ve licensed it from Iterated Systems.

So hopefully when I actually finish it I’ll have two decoders for the price of one.

Update: while refreshing the information about fractal compression, I discovered in the Wickedpedia article on it that two companies claimed they got exclusive license for fractal compression algorithm from Iterated Systems—TMM and Dimension. The last one licensed it to Spectrum Holobyte to be used for FMV. And what do you know, that explains why FVF is named so and why its video bitstream syntax is the same as in the other two (and the code seems to be the same too). So I guess it means I’ll have almost the same decoder (but with different containers) in NihAV, na_game_tool and na_eofdec.

A small rant about compression

Wednesday, October 8th, 2025

The recent news about OpenZL made me think about some tangential issue.

The approach by itself is nothing new really, a lot of archivers include pre-processing step for data (I don’t know if there are an earlier examples, but de-interleaving or delta-coding floating-point data might be only slightly younger than geo file in the Calgary Corpus, LZX includes translating call addresses into absolute offset for better compression etc); more advanced archivers implement flexible processing steps (e.g. RAR had its own custom VM for pre-processing data which was essentially a security nightmare cut-down 8086 instruction set, and ZPAQ which allows to define compression steps for data-specific compression that won’t require a new decoder—in other words, something very similar to OpenZL). There’s nothing wrong with the approach and it’s probably useful outside, say, genomic data compression, it’s just it raises two questions: what is the current general compression/resources accepted trade-off and what would be a good candidate for an open-source archiver?

The first question is obvious: with the times the available CPU power and RAM grows along with the amounts of data to compress. Back in the day gzip was the golden standard and bzip2 was something eating too much RAM and worked rather slow. A bit later .tar.bz2 started to replace .tgz for, say, distribution tarballs. Nowadays it’s .tar.xz or .tar.zstd, which makes me wonder if it’s really the sweet spot for now or if things will move to adapting a compression scheme that’s slower but offers better compression ratio.

The second question follows from the first one: what would be a good candidate, specifically for open-source applications. If you look around, there are not so many of those. You can divide existing formats (don’t confuse them with implementations) into several (sometimes overlapping) categories:

  • proprietary formats with an official open-source decoder at best (like RAR) or unofficial reverse-engineered one (e.g. RAD mythical sea creatures formats and LZNIB);
  • open-source compression libraries targeting fast compression (LZO, LZ4, FLZ, LZF, etc, etc);
  • old open-source compressors (compress, gzip, bzip2, zip);
  • various programs trying to bank on well-known name while not being related (bzip3 and anything with “zip” in its name really);
  • state-of-the-art compressors that require insane amounts of CPU and RAM (anything PAQ-based, NNCP);
  • corporate-controlled open-source formats (brotli, Zstandard).

The question is what would be a good candidate for the next de-facto compression standard. The current widespread formats are good since they’re easy to implement and there are many independent implementations in various languages, but how much can we trust the next generation—the one with flexible input pre-processing (the third question would be if that’s really the design approach mainstream compression formats will take).

For instance, I have nothing against LZMA but considering that its author is russian how much can we trust that he won’t be visited by FAPSI representatives and forced to make changes in LZMA3 design that will make Jia Tan green with envy? As for the formats coming from corporations, are you really going to rely on their goodwill? I think the story with LZW should serve as a warning.

The only reassuring thing is that it is still rather easy to design a new compression scheme and even achieve decent compression ratio and performance (unlike training a neural network or even designing a video codec to rival H.265), so good candidates are likely to appear sooner or later.

“AI” is not there to help you

Thursday, October 2nd, 2025

I’m not writing this post to convince somebody, I write it mostly to formulate my thoughts and so I can refer to it later saying “called it”.

First of all, what do I have against AI and why the first word of the title is in quotes? Not much, actually, it’s just what gets hyped as AI nowadays is far from it—hence the quotes. It can do something, sometimes it can do it good—but in general it is far from being intelligence.

IMO it’s more accurate to call it artificial managers, since they do what your typical manager does: spew completely meaningless bullshit, take your work and reword it in corporate-speak way, plagiarise somebody’s work and take credit for it. Also maybe it’s acceptable for typical USian not to ever learn, but normally it is expected from human to keep learning and re-evaluating things throughout whole life. Of course I’m no AI scientist (and so my opinion does not matter) but I believe that proper AI should have two feedback loops: an inner loop that controls what is being done, and an outer loop that adjusts knowledge based on the new experience. Inner feedback loop means that while executing the task you’re trying to understand what you got, how it relates to the goal, and then you adjust what you’re doing if necessary. It’s like in a famous joke about the difference between physicists and mathematicians being asked to boil a kettle when it’s full and on the oven already: physicist will simply light a match and light fire, mathematician will take that kettle off the oven and pour water out, thus reducing the task to the well-known one. Outer feedback loop means learning from the experience. For example, LLMs apparently still make the same mistake as small children on answering what is larger, 4.9 or 4.71; unlike small children they don’t learn from it, so next time they will give the same answer or make the same mistake on some other numbers. I reckon implementing both such loops is feasible even if the inner loop will require a magnitude more of resources (for reverse engineering its own output, calculating some metric for deviation from goal and re-doing it again if needed), the outer loop is much worse since it would mean going over the knowledge base (model weights, whatever) and adjusting it (by reinforcing some parts and demoting or even deleting others).

So if I believe it can be improved why I claim it’s not helpful? What I’m saying is that while in current state it still may be useful for you, it is not being developed to make your life easier. It should be obvious that developing such system takes an enormous effort—all the input data to collect and process let alone R&D and learning control—so it’s something that can be done only by a large community or a large company (often stealing results of the former). And companies do something not to advance human well-being but rather to get profit, “dishonestly, if we can; honestly if we must” (bonus points for recognising what sketch this quote is from). I consider the current situation to be a kind of arms race: somebody managed to convince somebody that AI will be an ultimate solution, so the company that gets first practical solution will get an extreme advantage over competitors—thus current multi-billion budgets are spent mostly on fear of missing out.

What follows from the fact that AI is being developed by large companies in pursuit of commercial interests? Only that its goal is not to provide free service but rather to return investments and make profit. And profit from replacing expensive workforce is much higher (and real) compared to what you might get from just offering some service to random users (especially if you do it for free). Hence the apt observation that “AI” takes creative (i.e. highly-paid) work instead of house chores while people would rather have it the other way round.

As the result if the things go the way the companies that develop AI want, a lot of people will be rather superfluous. There will be no need for the developers, there will be no need for people doing menial tasks like giving information, performing moderation and such (we can observe that even now to large extent). There will be no reasons for those free-to-play games either as non-paying players there are just to create background for whales (called so because they spend insane amounts of money on the game). Essentially the whole world will be like Web of Bullshit with people being rather a nuisance.

Of course it is just an attempt to model how events will develop based on incomplete data. Yet I remain an optimist and expect humanity to drive itself to an early grave before AI will pose any serious threat.

New obscure formats

Saturday, September 27th, 2025

Despite how it looks, I still monitor Discmaster for new additions in hope there’s something interesting there. Sometimes there is, which I can either postpone for later or actually take a look and try to figure out how it works. Here’s a list of stuff I looked at and found at least somewhat interesting:

  • beta version of VfW SDK contained a special AVI file that had a different structure and apparently can contain only single stream. I added a support for it to NihAV just for completeness sake;
  • ReVoice Studio discs contain some AVD files that are AVI files in reality. The problem there is that those files seem to employ Indeo feature for content protection and require an access key to decrypt data. For rather obvious reasons it’s not something I’m willing to pursue further;
  • some Licensed Cartoon Property Activity Center discs contain videos that use ARBC codec. I looked at it long time ago at Paul’s request so I remember he wrote a decoder for it. But it turned out that there’s a version of the codec used in MOV—with the 16-bit values being big-endian now. So I also implemented a decoder for both codec flavours just for completeness sake;
  • Video Toaster 1.0 (now for Windows, who cares about Amiga system-seller?) had some samples in RTV format. It turned out to be uncompressed interlaced video in packed format. I’ve implemented a decoder for it in na_eofdec;
  • speaking of Amiga, there’s a game called Golem with animations in XFL format (that are raw frames in per-bitplane format). Those are not too interesting to support but there’s also a stand-alone video player featuring some game footage and its XFL has a proper format, with audio and palettes. So I supported it in na_eofdec (since it’s not strictly game format).

There is at least a dozen of other formats that I found by searching for large unknown files, so currently there’s enough work waiting for me (maybe I’ll actually do something eventually too).

na_eofdec finally started

Sunday, September 14th, 2025

While librempeg does its awesome stuff, I’ve finally started working on na_eofdec, a new tool for decoding exotic/obscure formats (hence the name). It is intended to decode fringe formats I find interesting enough to write a decoder but not useful enough to be included into main NihAV (maybe I’ll move some formats from there to this tool as I did previously with some game formats and na_game_tool).

And while it is based on na_game_tool and will keep its interface, there’s one major technical difference under the hood: while game formats are expected to produce constant rate content (always the same number of frames per second and audio blocks of equal size), these formats are allowed to have variable framerate and audio block length. Currently it affects only AVI writer (which I modified to have synchronisation, frame duplication and splitting audio input into blocks of equal length) but in the future I hope to write a MOV muxer to handle such inputs natively.

Of course such tool is useless without decoders, so I’ve added a pair of them for Lantern MOV formats. These are a pair of RLE-based animation formats using IFF or RIFF structure (little-endian in either case). There are more candidates out there, like all those IFF-based formats. As usual, the first release will happen when I implement at least a dozen of original decoders, so it will take a while.