FFhistory: first slop

May 27th, 2026

While I observe the world with its “AI” evangelists suspiciously reminding of annoying religious missionaries (yes, I’m pretty sure I’ve heard the news from that newer part of widely circulated book that’s just under two millennia old, thank you very much) and the feats of token-wasting (name changed from “vibe-coding” to keep up with the times) like two FFmpeg rewrite attempts in Rust—probably just to spite the Nigel (name changed to protect the guilty) formerly responsible for FFaccount, since slop in any other language would be as smelly secure. Since I don’t use either of those three projects, I’d rather talk about the time when FFmpeg almost got its first organic slop.

People submitting sub-par patches are no news (as there were e.g. mediocre H.264 encoder rejected for not being good for anything really—x264 is a tough competition after all; or MS Video-1 encoder initially rejected for the same reason but later merged because it’s a feature), but this one is special because it had all signs of the modern “AI” slop while being produced organically more than a decade ago: doing something tangentially related to the original goal—check, being lots of incomprehensible code—check, a lot of effort wasted onto it—guess for yourselves.

This happened when a guy from a group Programmers Doing Awesome Things (name changed to protect the guilty) was taking part in Baidu Summer of Code (name changed to reflect company values) with his project being a support of a certain audio format. What we got instead was a large library doing something more generic; in theory it could be used to decode the audio format in question but I think nobody has found out how to do that. The reaction was more “uhm, thanks” and while that student was not failed (at least that’s what a quick search tell me), the library has never merged and probably it’s been completely lost in time by now. My memory is not as bad as it was back then (yes, it’s even worse) so I can’t remember if there were actual attempts to make something out of it afterwards or all hope was abandoned outright. At least it gave us all a distinct memory and a short-lived meme of “nicknamePDAT” being used by various developers for a while.


I often think about it when I see these new projects with whatever insane amount of tokens wasted on them. They seem to include everything and then some more. For example, one of them (name withheld since I believe they don’t deserve any advertising) supports a handful of formats and compensates that by adding a lot of features that (theoretically) would make it do anything—from game streaming to mastering IMF for broadcasting—with only GUI being missing. Another one (name withheld for the reason stated above) does not have those features but it compensates it by the plethora of formats being supported. So if you ever thought that FFmpeg definitely needs its own vector font rendering (for e.g. SVG and PDF support because of course they’re at least planned to be supported) or that it’s not usable without 3D scene rendering capabilities then this slop is definitely for you! Also it’s fun to watch how it undoes its own progress by trying to make “AI” developer to plagiarise less (so now it’s all based on the “AI”-generated specifications that nobody can see).

You know what could really improve those projects? Actually having a point. I know that the main goal there is to make money off it (and it even works for some FFmpeg developers, so it may work here), but in order to achieve that it needs to offer potential users a solution for their problems (again, like FFmpeg started with open-source implementation of decoding and encoding popular formats based on H.261-H.263 and grew up from there into something that most people use to decode or convert their multimedia content). And a pile of code that does everything and nothing at the same time is not it. Actually I encountered one of those project by searching a crate with libxvid bindings (and got only that thing in the search results, which doesn’t support even what my decoder does let alone the stuff I’d rather use libxvid for).

There was a joke about one hardware company (name not given since I forgot it) that its motto was “ready! shoot! aim!”. With modern tools people are so excited that they can shoot a lot, with minimum readying time, that they forgot about aiming entirely. So I’ll wait aside while the rest have fun shooting bystanders and themselves and keep doing what nobody else cares about.

na_eofdec initial release

May 11th, 2026

Since I got lucky during weekend with some formats, I got enough of them to release na_eofdec. This is a tool similar to na_game_tool but oriented at generic exotic and obscure formats (or Amiga ones, put them into whatever category you like). So if you’re familiar with that one (why?!) you should have no troubles with the new one either (or at least it should be the same troubles).

The motivation behind it is about the same as with the other tool: decode whatever formats I find interesting enough to implement decoders for but not interesting enough to have them supported in the main NihAV base. Also it serves as a playground for various other things (like MOV muxer in this case, which served as the base for more versatile muxer in NihAV).

Anyway, it is released in hope (but no expectations and definitely no guarantees) that it will be useful for some purpose for others. The release is available at its own sub-page at nihav.org (and there’s a link to it in the appropriate section of this blog too).

NihAV: QT support enhancements

May 8th, 2026

When I have enough inspiration, I improve NihAV. When I don’t (which is more common state to me), I RE codecs or write blog posts—so here’s one.

First of all, I’ve started adding non-raw encoders for some common QT formats. It’s not that there are no open-source encoders for them, but I do them mostly to find out how it is done and maybe learn something new in the process. For instance, RLE encoding combines skips, runs and pixel copies; this rises the question of optimal encoding as sometimes it may be cheaper to encode a whole area as new pixels instead of a mix of copy+skip+copy. So I’ve implemented a greedy approach (i.e. code longest skip or run and fall back to encoding raw if those two fail) as well as slow but optimal one. It’s a variation of trellis coding: just calculate encoding cost with each mode (skip/run/raw) to all next possible positions and if it’s lower than the existing one, use that mode; at the end simply trace back the decisions that gave least cost at the end and encode them in right order.

Then I also added RPZA encoder. This is essentially the first texture codec before GPUs with the need for texture compression, its main compression mode is encoding 4×4 block with four colours where two colours are linearly interpolated from two explicitly transmitted colours. There is no apparent way on how to do it fast, so I ended up with an extremely simplified scheme: first I calculate the maximum difference between components and pick the one with the largest difference (or code block as single-colour if it’s small enough) to decide what values to pick, then I calculate explicit colours from an average of input pixels close to minimum and maximum ends of that range. I also have a refinement step by running vector quantisation loop to adjust the ends but it’s rarely needed in practice.

There are still more encoders to implement (SMC, SVQ1, IMA ADPCM and MACE) but none of them is interesting beside SVQ1, so probably I’ll write about it when/if I ever get to implementing an encoder for it (it does not matter if The Multimedia Mike has done that over fifteen years ago—NIH is there in the project name for a reason).

Now, surprisingly enough I’ve improved decoding support as well. The original QuickTime had SIVQ codec which is a straightforward 256-entry codebook for 2×2 RGB24 tiles followed by codebook indices. I had read its binary specification some time ago and recently I was able to locate (probably the only existing) sample for it, which is a good reason to write a decoder for it. It was well-spent five minutes of my time. Maybe in the future I’ll also do something about Pixar codecs (Ghidra works better with raw m68k version of the decoder than with 16-bit Windows 3.x version of the same).

And finally I’ve improved the support for multi-descriptor MOV files. I mentioned it some time ago and I got bitten by it again recently. For example, alice_lo_m.mov from samples.mplayerhq.hu got just first frame decoded for me and many QuickTime 1.5 sample videos (with its developers) gave an error on the last frame. For the former it’s because first frame is JPEG and the rest of them are SVQ1, while the latter samples are coded with Cinepak but the last frame may be a special one encoded with RPZA. And there was another file fully encoded with RPZA—but with the majority of it being 160×120 while last dozen of frames or so were 320×240. So I finally got annoyed enough to implement multiple streams per track so at least the frames get marshalled to the correct decoder, even if it leads to the partial streams being rather unusable. Maybe one day I’ll write a tool which will walk through MOV and render all tracks in correct sequence (taking edit lists into account), scaling and adjusting playback rate as needed, producing a raw MOV file that can be played without special hacks; or maybe I don’t hate myself that much.

That’s it for now, don’t expect anything soon (MVS description may appear but who’s waiting for that?).

Quickly about AC2

May 2nd, 2026

I’ve finally had a look at this codec and it’s not particularly remarkable (and I did it mostly because it beats documenting minute details of MVS).

Anyway, there’s nothing much interesting about it. Like its successor, it employs parametric bit allocation to read bits from the input block. The main difference is that there are only two channels allowed and there’s single block per channel too. More curious thing is there are two revisions of this codec, with revision A having simpler bit allocation while revision B has additional tables (dependent on sample rate of course) to adjust how many bits per band will be eventually read. Also unlike successor there are no tricks to allocate fractional amount of bits per band, it’s always an integer amount of bits.

From coding side it seems to be more or less straightforward MDCT with the only interesting trick is splitting frame data into sums and differences (not channels but the subsequent samples in frequency domain) and coding them in such separate matter.

Overall, it’s a simple perceptual codec (and a half, considering revision B) that worked not that bad. And considering the claims given here, I suppose it was essentially equivalent to MPEG I Layer II. At least it’s more interesting than a bit of audio coding bolted to the (patented of course) DRC and named AC4.

NihAV: palettisation

April 30th, 2026

While I’ve added palettisation support for NihAV about six years ago, it was limited to per-frame palette generation back then. Since I had only two encoders supporting paletted input and both were accepting palette changes, it worked fine. It’s only when I decided to implement MOV muxer I really got a need for palettisation using global palette, so I’ve started experimenting with that.

First and foremost, design. While frame palettisation is a part of NAScale that handles video frame conversion from and to various formats, this palettisation mode is bolted to nihav-encoder. It is actually implemented in two parts: initial pass that decodes input and generates palette at the end, and actual frame palettisation (which can actually work just fine without that pass if you tell it to use some pre-defined palette). I actually started with the second part and added palette generation later (for tests using default QuickTime palette was enough). Then I went even further and extended palette generation to support segmented mode (i.e. palette may change but not for every frame, I made the limit configurable but it should be at least 10 frames—storing palette for each frame before processing is too much).

This mode is a bit fragile since palettes are calculated for the decoded frames and not for processed frames. And for palette segments it’s even worse since it tells the number of frames for which the palette is valid, so framerate conversion will make it a mess. But the alternative leads to madness libavfilter and that’s hard pass for me. I see filters more as a drug that makes multimedia projects shift attention to them, making it more and more about e.g. filter negotiation and complex graphs support and less about playing actual media; consequently, making everything a filter is a sure sign the project is on its way to becoming obsolete (yes, I don’t hold neither DirectShow nor gstreamer in high regard).

Anyway, after overall design description it’s time to talk about implementation details. Palette generation for video differs from palette generation for single image by the sheer amount of data it needs to take into account. So while I have “let’s waste memory and have a table of 64-bit counters for each possible colour” my first mode was putting colours into smaller buckets (bucket index is calculated from the top bits of components) and join similar colours (e.g. differing just by two low bits in each component) when the number of entries gets too high. It is slower and not as accurate but it consumes less memory and performing vector quantisation on hundreds of thousands of entries is much faster than on millions. So it may have its uses.

Then palette segments generation. I don’t do anything fancy and simply calculate coarse colour histogram to decide when to start a new segment. For cut-off criterion I selected the ratio between correlated histograms and auto-correlated new frame histogram. If they’re of about the same magnitude then I can add this frame to a group, update group histogram and continue, otherwise I generate palette for the just finished segment and start a new one. It’s naïve but it seems to work reasonably well.

Palettisation itself consists of finding an appropriate palette entry for the input colour (I’m aware of dithering but haven’t bothered with it yet). I use the same three methods as back then: brute force search, local search and k-d tree search. The latter is faster than the rest but gives horrible quality so I don’t know if I should improve it or throw away. Local search (especially with a small cache for last 32 results) is a nice trade-off between the rest. And brute force search is implemented by filling a small 16MB table which maps each input colour to the palette index; it is slow to generate but it works extremely fast with global palette and reasonably large video (i.e. more than those sixteen million pixels). For segmented palettes it’s better to use local search though.

That’s it. I realise that I’m the only user of such feature but it gave me something to play with and brought some joy implementing it. After all, who else can claim he converted movieCD into animated GIF without using any 16- or even 32-bit code?

Quickly about Factor 5 VID 1

April 24th, 2026

Yes, calculating that results in VID5 but in reality it’s MPEG-4 part 2 (minus some parts). Paul has asked me to look at it some time ago, so I did (which is much better than implementing a decoder for it).

As I said already, this is essentially MPEG-4 part 2 with some insane parts being cut off (but not enough to turn it back into ITU H.263): there’s no support for special texture shapes, interlacing or even quarterpel motion compensation. There are still B- and S-frames to complicate things though. Bitstream format is cut down as well to remove most of the nonsense (or omgFFeatures if you have that view), so there are just frames containing basic header (or slightly less basic with GMC and S-frames) plus macroblock data. Macroblock data is almost identical to the expected format—they even still have sync pattern handling in MCBPC despite there being no need for that.

So on the one hard writing decoder for it is not that hard, as you can simply hack an existing decoder for that, and hard enough at the same time (because you either need to hack an existing decoder or implement it yourself and that ISO standard is not easy to comprehend and personally I decided not to touch S-frames at all and if the need arises I’m actually considering making a wrapper for xvidcore instead).

Meanwhile I still have MVS to document and lots of encoders to write to make use of my new palettiser (because so far I have just three codecs that can encode paletted formats—two of them are for AVI, one is GIF, and none are for MOV). So hopefully I’ll have something more interesting to write about next time.

Cinepak’s long-lost brother?

April 20th, 2026

While discmaster is stagnating (you know whom to thank for the shortages of HDDs as well as RAM), I still look through stuff there in hopes to find something interesting. Occasionally I manage to stumble upon something special indeed.

This time it was navigable movies bundled with QuickTime 1.5 or so. Apparently the idea behind them is that all frames are actually tiles of a much larger picture that user can navigate without exhausting all RAM trying to decode it as one contiguous image. If you thought about ISO H.EIC (aka HEIF or AVIf depending on intraframe codec employed) you may be right, but also it got re-branded (and maybe enhanced a bit?) some time later as QT-VR (sometimes I think no matter how stupid modern multimedia idea is, it’s been implemented in QuickTime a couple decades ago).

Anyway, out of four such movies, one was recognised and converted by discmaster software, two were playable (with my player) after I hacked file type tag to be MooV instead of APPL, the last one could not be decoded at all because it was of an unknown type.

Luckily for me resource data of that movie contained the decoder in m68k binary format (sometimes I think no matter how stupid modern multimedia idea is, it’s been implemented in QuickTime a couple decades ago—or did I say that already?). And just by looking at the frame contents I knew it was worth REing as I could spot YUV codebook right at the beginning and it was definitely not Cinepak (or Compact Video as it was known back then). The name was “CDROM Video codec” with tag cdvc but that didn’t tell me much. The file was created in 1992 while Wickedpedia claims that SuperMac Compact Video was added to QT around that time as well.

Anyway, let’s move to the format details. The codec starts with 24-byte header followed by YUV or (theoretically) RGB24 codebook, the another 24-byte header (containing frame dimensions among other things) and finally data. Frame is split into 4×4 blocks and first there is an opcode sent containing block type and number of blocks (minus one) of that type. Blocks are known to have three types: 4 vectors per blocks, 1 vector per block (scaled 2x), or simply skip.

The concepts of codebook-based coding are the same as in Cinepak, even YUV conversion formula is almost the same (with simplified coefficients using multiplication/division by two only). The main difference is using just one codebook for everything and coding format—while Cinepak uses separate bit masks for block types, this codec uses opcodes (which is common for other fruity codecs). So this makes me wonder where this codec comes from and how it is related to Compact Video. Was it some kind of predecessor? Was it developed by Malus as a competition or based on the licensed technology? Why was it abandoned?

Even if I ended up with more questions, it was still a fun way to spend a Sunday weekend (the rest of Sunday was spent travelling to/from Lower Ulm and it’s a differently fun way to spend Sundays; but that’s not the point here). Who knows, with a new search approach I may be able to uncover a couple more of ancient codecs to look at.

P.S. Another fun codec for very early QuickTime was SIVQ (that’s how it was called, I’ve failed to find anything but the decoder for it). It was simply 128-entry 2×2 codebook (in RGB24 format) followed by codebook indices. Probably the name stands for “SImple Vector Quantisation”. That makes it the third proper VQ codec in QT (SMC is a slightly different beast; and RPZA is the first texture codec instead).

NihAV: OS/2 multimedia support

April 17th, 2026

In theory I should be documenting the codes Paul has shared with me or MVS (did you know that it employs a rather interesting chroma subsampling method—coding three 8×8 blocks in a macroblock but chroma samples have less than a half of coefficients in zigzag order coded) but instead I’ll write about something nobody really cares about.

As some of you may know, RedHat had (at least) two multimedia formats developed: RLE-based PhotoMotion for RedHat PC (later licensed to American Laser Games that seems to extend it somewhat) and gradient-based UltiMotion codec for AVI (the format of choice for VfOS/2).

Since the codec is somewhat unique, I decided to write an encoder for it. This way I can re-encode e.g. some movieCD (a format hardly anybody remembers) to another obscure format nobody remembers exactly just because I can. But the main reason is to learn how it’s organised.

There are three distinctive features it has: shared chroma (i.e. 8×8 super-block can have just one pair of chroma samples instead of coding each 4×4 block with its own pair), quantised values (6-bit luma and 4-bit chroma) and of course gradients. Actually there are seven block coding modes and only half of them are gradient-based—the rest are more conventional skip block, scaled-down block (4 luma samples only), BTC (2 samples plus fill pattern) and raw block.

Gradients here are essentially filling the block in one of the direction a lot like intra prediction works in H.264 and later, the main differences being fewer angles and fill values being transmitted explicitly. Fun thing is that unlike other codecs there’s no easy way to transmit a flat block, you need to code it in some extended way as the simplest (“shallow”) mode codes a coarse gradient with two values, the second value being implicit N+1. More complicated (“LTC”) mode codes a fine gradient but allows only a four-colour combination present in 4096-entry codebook. There’s also an extended mode where you can code any values for a gradient.

This of course poses a challenge of finding a good gradient in reasonable time (because trying all 4096 combinations with all 16 directions may get a bit slow). For shallow coding it’s easier, you essentially have block split into two parts, so checking averages for those parts and seeing if they fit is enough. For LTC and extended mode I applied a similar trick by finding the averages of four samples used in each gradient angle and saw if they fit well enough (for LTC it was also checking that the samples are monotone increasing and checking only the more or less close codebook entries; probably there’s more of optimisation potential but I’m fine with it as is).

Actually I started it gradually: first by implementing simple “raw or skip blocks only” mode, then “any block type that does not introduce additional loss (beside YUV quantisation)” mode, then lossy mode, and finally fast-and-shitty mode. The idea behind the last one is to calculate blocks variance and to use that information to force block selection process (i.e. not try more complex block types on blocks with low variance). As you can guess from the name it did not work out that nice (but it was about twice as fast). But overall lossy mode works rather good and by introducing distortion thresholds I can vary output file size significantly (3-5 times smaller and still not being a block soup). I’m not going to bother with any rate control and overall I consider this experiment done.

In conclusion I’d like to write something but nothing comes to my mind. Stay tuned for more stuff nobody cares about (like obscure codecs or my experiments with palettisation).

Hollywood and “AI”

April 13th, 2026

“AI” as itself does not interest (or bother) me much, so I find it more interesting to look how this phenomenon interacts with the world around. Here’s one rather unexpected parallel.

I’m not trying to claim that current “AI” fills exactly the same niche or follow exactly the same history, but I find the coincidences rather amusing.

Hollywood started when most of the movie studios moved to California in order to be far from Edison’s company which owned a lot of patents on cinema equipment; there they started to indulge in mild and hard copyright violations—the most prominent example, let’s call it The Big Mouse to protect the guilty, was founded by a person whose original character was essentially stolen from him, so he created that Big Mouse (which looks suspiciously close to his first creation) and started to make money by taking public domain works and creating derivatives (which now were protected by all possible USian laws, to the extent of the certain nickname copyright term extension act got). And apparently they thrived on the fact that screenplays being derivative work could be used without any royalties to the original author (unless you explicitly want to use that work or author’s name for better publicity of course), so the studios saved a lot of money by producing a completely “original” script definitely not based on some book or another script sent to them by some naïve scriptwriter (I’m not sure that Nosferatu was even the first movie to pull such a trick but it definitely had a lot of followers in the following century).

Now look at the “AI” produced content and make the comparison for yourself.

Then there’s a common trend of financial gravitation, when smaller companies grow into larger ones by absorbing everything around (and occasionally merging) so even if you start with the cloud dust of many indepedtent researches it will eventually accrete into just a couple of giant companies (maybe with some satellites) absorbing anything they can reach. So like in Hollywood you have half a dozen of major studios and a couple of small fish, there maybe about the same number of large “AI” companies with no serious competition. It may also explain why the heads of the companies are often people who don’t understand anything about the businesses they run but that’s so common that you rather need to list rare exceptions.

And partly this can be explained (damn! I said I see it as a coincidence, not something following the same trajectory by obeying the same general laws) by their productions being soulless large-budgeted “original” productions (I’ll probably just call it SLOP for short) that use a lot of GPUs to render the final result.

But there’s definitely a difference! Hollywood is known for its accounting system that raises relatively large sums of investments and subsidies and then reports losses on movies no matter how much income they bring. I’ve never heard “AI” companies being accused of hiding profits, only for using a creative accounting to hide losses in order to attract more investments. Definitely nothing in common here!

So there you have it, two completely different categories of enterprise, giving you ephemeral products (in the sense that you watch a movie or run an “AI” agent for some task from them and then you’re left with nothing substantial afterwards, only vibes and feelings) and demonstrating similar behaviour. To repeat it at least for the third time, this is just a coincidence—but I get fun wherever I can find it.

On fruity MVS codec

April 11th, 2026

I could be writing about RedHat video encoder I just finished or work on REing DiVID1x on Paul’s request, but this was earlier in my queue.

Apparently on iVNC protocol there’s an option to use a custom iCodec for that. Since I was asked to look at it, here are my preliminary findings (more detailed bitstream description will follow eventually).

So packet starts with a byte telling payload type (0 – intra frame, 1 – inter frame, 2 – custom quantisation matrices for luma and chroma, 64 bytes each). After that the rest of data follows.

Intra frames code a series of tiles with tile metadata and actual tile content being separated into different parts. Frame data starts with two DCT quantisers followed by 24-bit big-endian metadata part size, then there’s metadata, and finally it’s tile data.

Tile metadata codes 3-bit tile type and the number of tiles having that type (00001110 mean 1-15 tiles, 11110 means next 8-bit value plus sixteen, 111110 means next 15-bit value plus sixteen, 111110 means next 22-bit value plus sixteen). Tiles can be of the following types:

  • white tile—tile is completely filled white;
  • last match—previous(?) tile is copied;
  • upper match—tile above(?) is copied;
  • black and white—one bit per pixel (0 – black, 1 – white);
  • two-colour tile—almost the same but with two colours transmitted first (8-bit luma and 6-bit chroma values);
  • DCT—tile data is coded with ProRes-like DCT;
  • match tile—re-paint last recently used DCT tile;
  • cached tile—re-paint DCT tile with 16-bit index from LRU cache.

Inter frames start with two bytes telling the number of coded chroma coefficients and the rest is single bitstream with 2-bit tile type and whatever tile content is stored. Tile types are: skip, DCT, match tile, and cached tile. The first type should be obvious, the rest is probably the same as in intra frames.

Also frame data is supposed to end with "mvs\0" but I guess this matters only for people trying to write a compatible encoder (or checking that the data was decoded correctly).

See, it’s a rather simple codec, so hopefully I’ll clarify some things (like cache behaviour, YUV coefficients and actual DCT bitstream format), document it at The Wiki and move to something else.