Archive for April, 2026

NihAV: palettisation

Thursday, April 30th, 2026

While I’ve added palettisation support for NihAV about six years ago, it was limited to per-frame palette generation back then. Since I had only two encoders supporting paletted input and both were accepting palette changes, it worked fine. It’s only when I decided to implement MOV muxer I really got a need for palettisation using global palette, so I’ve started experimenting with that.

First and foremost, design. While frame palettisation is a part of NAScale that handles video frame conversion from and to various formats, this palettisation mode is bolted to nihav-encoder. It is actually implemented in two parts: initial pass that decodes input and generates palette at the end, and actual frame palettisation (which can actually work just fine without that pass if you tell it to use some pre-defined palette). I actually started with the second part and added palette generation later (for tests using default QuickTime palette was enough). Then I went even further and extended palette generation to support segmented mode (i.e. palette may change but not for every frame, I made the limit configurable but it should be at least 10 frames—storing palette for each frame before processing is too much).

This mode is a bit fragile since palettes are calculated for the decoded frames and not for processed frames. And for palette segments it’s even worse since it tells the number of frames for which the palette is valid, so framerate conversion will make it a mess. But the alternative leads to madness libavfilter and that’s hard pass for me. I see filters more as a drug that makes multimedia projects shift attention to them, making it more and more about e.g. filter negotiation and complex graphs support and less about playing actual media; consequently, making everything a filter is a sure sign the project is on its way to becoming obsolete (yes, I don’t hold neither DirectShow nor gstreamer in high regard).

Anyway, after overall design description it’s time to talk about implementation details. Palette generation for video differs from palette generation for single image by the sheer amount of data it needs to take into account. So while I have “let’s waste memory and have a table of 64-bit counters for each possible colour” my first mode was putting colours into smaller buckets (bucket index is calculated from the top bits of components) and join similar colours (e.g. differing just by two low bits in each component) when the number of entries gets too high. It is slower and not as accurate but it consumes less memory and performing vector quantisation on hundreds of thousands of entries is much faster than on millions. So it may have its uses.

Then palette segments generation. I don’t do anything fancy and simply calculate coarse colour histogram to decide when to start a new segment. For cut-off criterion I selected the ratio between correlated histograms and auto-correlated new frame histogram. If they’re of about the same magnitude then I can add this frame to a group, update group histogram and continue, otherwise I generate palette for the just finished segment and start a new one. It’s naïve but it seems to work reasonably well.

Palettisation itself consists of finding an appropriate palette entry for the input colour (I’m aware of dithering but haven’t bothered with it yet). I use the same three methods as back then: brute force search, local search and k-d tree search. The latter is faster than the rest but gives horrible quality so I don’t know if I should improve it or throw away. Local search (especially with a small cache for last 32 results) is a nice trade-off between the rest. And brute force search is implemented by filling a small 16MB table which maps each input colour to the palette index; it is slow to generate but it works extremely fast with global palette and reasonably large video (i.e. more than those sixteen million pixels). For segmented palettes it’s better to use local search though.

That’s it. I realise that I’m the only user of such feature but it gave me something to play with and brought some joy implementing it. After all, who else can claim he converted movieCD into animated GIF without using any 16- or even 32-bit code?

Quickly about Factor 5 VID 1

Friday, April 24th, 2026

Yes, calculating that results in VID5 but in reality it’s MPEG-4 part 2 (minus some parts). Paul has asked me to look at it some time ago, so I did (which is much better than implementing a decoder for it).

As I said already, this is essentially MPEG-4 part 2 with some insane parts being cut off (but not enough to turn it back into ITU H.263): there’s no support for special texture shapes, interlacing or even quarterpel motion compensation. There are still B- and S-frames to complicate things though. Bitstream format is cut down as well to remove most of the nonsense (or omgFFeatures if you have that view), so there are just frames containing basic header (or slightly less basic with GMC and S-frames) plus macroblock data. Macroblock data is almost identical to the expected format—they even still have sync pattern handling in MCBPC despite there being no need for that.

So on the one hard writing decoder for it is not that hard, as you can simply hack an existing decoder for that, and hard enough at the same time (because you either need to hack an existing decoder or implement it yourself and that ISO standard is not easy to comprehend and personally I decided not to touch S-frames at all and if the need arises I’m actually considering making a wrapper for xvidcore instead).

Meanwhile I still have MVS to document and lots of encoders to write to make use of my new palettiser (because so far I have just three codecs that can encode paletted formats—two of them are for AVI, one is GIF, and none are for MOV). So hopefully I’ll have something more interesting to write about next time.

Cinepak’s long-lost brother?

Monday, April 20th, 2026

While discmaster is stagnating (you know whom to thank for the shortages of HDDs as well as RAM), I still look through stuff there in hopes to find something interesting. Occasionally I manage to stumble upon something special indeed.

This time it was navigable movies bundled with QuickTime 1.5 or so. Apparently the idea behind them is that all frames are actually tiles of a much larger picture that user can navigate without exhausting all RAM trying to decode it as one contiguous image. If you thought about ISO H.EIC (aka HEIF or AVIf depending on intraframe codec employed) you may be right, but also it got re-branded (and maybe enhanced a bit?) some time later as QT-VR (sometimes I think no matter how stupid modern multimedia idea is, it’s been implemented in QuickTime a couple decades ago).

Anyway, out of four such movies, one was recognised and converted by discmaster software, two were playable (with my player) after I hacked file type tag to be MooV instead of APPL, the last one could not be decoded at all because it was of an unknown type.

Luckily for me resource data of that movie contained the decoder in m68k binary format (sometimes I think no matter how stupid modern multimedia idea is, it’s been implemented in QuickTime a couple decades ago—or did I say that already?). And just by looking at the frame contents I knew it was worth REing as I could spot YUV codebook right at the beginning and it was definitely not Cinepak (or Compact Video as it was known back then). The name was “CDROM Video codec” with tag cdvc but that didn’t tell me much. The file was created in 1992 while Wickedpedia claims that SuperMac Compact Video was added to QT around that time as well.

Anyway, let’s move to the format details. The codec starts with 24-byte header followed by YUV or (theoretically) RGB24 codebook, the another 24-byte header (containing frame dimensions among other things) and finally data. Frame is split into 4×4 blocks and first there is an opcode sent containing block type and number of blocks (minus one) of that type. Blocks are known to have three types: 4 vectors per blocks, 1 vector per block (scaled 2x), or simply skip.

The concepts of codebook-based coding are the same as in Cinepak, even YUV conversion formula is almost the same (with simplified coefficients using multiplication/division by two only). The main difference is using just one codebook for everything and coding format—while Cinepak uses separate bit masks for block types, this codec uses opcodes (which is common for other fruity codecs). So this makes me wonder where this codec comes from and how it is related to Compact Video. Was it some kind of predecessor? Was it developed by Malus as a competition or based on the licensed technology? Why was it abandoned?

Even if I ended up with more questions, it was still a fun way to spend a Sunday weekend (the rest of Sunday was spent travelling to/from Lower Ulm and it’s a differently fun way to spend Sundays; but that’s not the point here). Who knows, with a new search approach I may be able to uncover a couple more of ancient codecs to look at.

P.S. Another fun codec for very early QuickTime was SIVQ (that’s how it was called, I’ve failed to find anything but the decoder for it). It was simply 128-entry 2×2 codebook (in RGB24 format) followed by codebook indices. Probably the name stands for “SImple Vector Quantisation”. That makes it the third proper VQ codec in QT (SMC is a slightly different beast; and RPZA is the first texture codec instead).

NihAV: OS/2 multimedia support

Friday, April 17th, 2026

In theory I should be documenting the codes Paul has shared with me or MVS (did you know that it employs a rather interesting chroma subsampling method—coding three 8×8 blocks in a macroblock but chroma samples have less than a half of coefficients in zigzag order coded) but instead I’ll write about something nobody really cares about.

As some of you may know, RedHat had (at least) two multimedia formats developed: RLE-based PhotoMotion for RedHat PC (later licensed to American Laser Games that seems to extend it somewhat) and gradient-based UltiMotion codec for AVI (the format of choice for VfOS/2).

Since the codec is somewhat unique, I decided to write an encoder for it. This way I can re-encode e.g. some movieCD (a format hardly anybody remembers) to another obscure format nobody remembers exactly just because I can. But the main reason is to learn how it’s organised.

There are three distinctive features it has: shared chroma (i.e. 8×8 super-block can have just one pair of chroma samples instead of coding each 4×4 block with its own pair), quantised values (6-bit luma and 4-bit chroma) and of course gradients. Actually there are seven block coding modes and only half of them are gradient-based—the rest are more conventional skip block, scaled-down block (4 luma samples only), BTC (2 samples plus fill pattern) and raw block.

Gradients here are essentially filling the block in one of the direction a lot like intra prediction works in H.264 and later, the main differences being fewer angles and fill values being transmitted explicitly. Fun thing is that unlike other codecs there’s no easy way to transmit a flat block, you need to code it in some extended way as the simplest (“shallow”) mode codes a coarse gradient with two values, the second value being implicit N+1. More complicated (“LTC”) mode codes a fine gradient but allows only a four-colour combination present in 4096-entry codebook. There’s also an extended mode where you can code any values for a gradient.

This of course poses a challenge of finding a good gradient in reasonable time (because trying all 4096 combinations with all 16 directions may get a bit slow). For shallow coding it’s easier, you essentially have block split into two parts, so checking averages for those parts and seeing if they fit is enough. For LTC and extended mode I applied a similar trick by finding the averages of four samples used in each gradient angle and saw if they fit well enough (for LTC it was also checking that the samples are monotone increasing and checking only the more or less close codebook entries; probably there’s more of optimisation potential but I’m fine with it as is).

Actually I started it gradually: first by implementing simple “raw or skip blocks only” mode, then “any block type that does not introduce additional loss (beside YUV quantisation)” mode, then lossy mode, and finally fast-and-shitty mode. The idea behind the last one is to calculate blocks variance and to use that information to force block selection process (i.e. not try more complex block types on blocks with low variance). As you can guess from the name it did not work out that nice (but it was about twice as fast). But overall lossy mode works rather good and by introducing distortion thresholds I can vary output file size significantly (3-5 times smaller and still not being a block soup). I’m not going to bother with any rate control and overall I consider this experiment done.

In conclusion I’d like to write something but nothing comes to my mind. Stay tuned for more stuff nobody cares about (like obscure codecs or my experiments with palettisation).

Hollywood and “AI”

Monday, April 13th, 2026

“AI” as itself does not interest (or bother) me much, so I find it more interesting to look how this phenomenon interacts with the world around. Here’s one rather unexpected parallel.

I’m not trying to claim that current “AI” fills exactly the same niche or follow exactly the same history, but I find the coincidences rather amusing.

Hollywood started when most of the movie studios moved to California in order to be far from Edison’s company which owned a lot of patents on cinema equipment; there they started to indulge in mild and hard copyright violations—the most prominent example, let’s call it The Big Mouse to protect the guilty, was founded by a person whose original character was essentially stolen from him, so he created that Big Mouse (which looks suspiciously close to his first creation) and started to make money by taking public domain works and creating derivatives (which now were protected by all possible USian laws, to the extent of the certain nickname copyright term extension act got). And apparently they thrived on the fact that screenplays being derivative work could be used without any royalties to the original author (unless you explicitly want to use that work or author’s name for better publicity of course), so the studios saved a lot of money by producing a completely “original” script definitely not based on some book or another script sent to them by some naïve scriptwriter (I’m not sure that Nosferatu was even the first movie to pull such a trick but it definitely had a lot of followers in the following century).

Now look at the “AI” produced content and make the comparison for yourself.

Then there’s a common trend of financial gravitation, when smaller companies grow into larger ones by absorbing everything around (and occasionally merging) so even if you start with the cloud dust of many indepedtent researches it will eventually accrete into just a couple of giant companies (maybe with some satellites) absorbing anything they can reach. So like in Hollywood you have half a dozen of major studios and a couple of small fish, there maybe about the same number of large “AI” companies with no serious competition. It may also explain why the heads of the companies are often people who don’t understand anything about the businesses they run but that’s so common that you rather need to list rare exceptions.

And partly this can be explained (damn! I said I see it as a coincidence, not something following the same trajectory by obeying the same general laws) by their productions being soulless large-budgeted “original” productions (I’ll probably just call it SLOP for short) that use a lot of GPUs to render the final result.

But there’s definitely a difference! Hollywood is known for its accounting system that raises relatively large sums of investments and subsidies and then reports losses on movies no matter how much income they bring. I’ve never heard “AI” companies being accused of hiding profits, only for using a creative accounting to hide losses in order to attract more investments. Definitely nothing in common here!

So there you have it, two completely different categories of enterprise, giving you ephemeral products (in the sense that you watch a movie or run an “AI” agent for some task from them and then you’re left with nothing substantial afterwards, only vibes and feelings) and demonstrating similar behaviour. To repeat it at least for the third time, this is just a coincidence—but I get fun wherever I can find it.

On fruity MVS codec

Saturday, April 11th, 2026

I could be writing about RedHat video encoder I just finished or work on REing DiVID1x on Paul’s request, but this was earlier in my queue.

Apparently on iVNC protocol there’s an option to use a custom iCodec for that. Since I was asked to look at it, here are my preliminary findings (more detailed bitstream description will follow eventually).

So packet starts with a byte telling payload type (0 – intra frame, 1 – inter frame, 2 – custom quantisation matrices for luma and chroma, 64 bytes each). After that the rest of data follows.

Intra frames code a series of tiles with tile metadata and actual tile content being separated into different parts. Frame data starts with two DCT quantisers followed by 24-bit big-endian metadata part size, then there’s metadata, and finally it’s tile data.

Tile metadata codes 3-bit tile type and the number of tiles having that type (00001110 mean 1-15 tiles, 11110 means next 8-bit value plus sixteen, 111110 means next 15-bit value plus sixteen, 111110 means next 22-bit value plus sixteen). Tiles can be of the following types:

  • white tile—tile is completely filled white;
  • last match—previous(?) tile is copied;
  • upper match—tile above(?) is copied;
  • black and white—one bit per pixel (0 – black, 1 – white);
  • two-colour tile—almost the same but with two colours transmitted first (8-bit luma and 6-bit chroma values);
  • DCT—tile data is coded with ProRes-like DCT;
  • match tile—re-paint last recently used DCT tile;
  • cached tile—re-paint DCT tile with 16-bit index from LRU cache.

Inter frames start with two bytes telling the number of coded chroma coefficients and the rest is single bitstream with 2-bit tile type and whatever tile content is stored. Tile types are: skip, DCT, match tile, and cached tile. The first type should be obvious, the rest is probably the same as in intra frames.

Also frame data is supposed to end with "mvs\0" but I guess this matters only for people trying to write a compatible encoder (or checking that the data was decoded correctly).

See, it’s a rather simple codec, so hopefully I’ll clarify some things (like cache behaviour, YUV coefficients and actual DCT bitstream format), document it at The Wiki and move to something else.

The age of stupid greed?

Saturday, April 4th, 2026

Of course neither stupidity nor greed (nor a combination of both) has been scarce at any point of human history, but in these days it feels like the main driving force behind various decisions at all possible scales, from individuals to the countries.

Naturally, despite having the common mechanics (some entity pursuing short-term gains while destroying the prospect of long-term gains) there are different flavours of it and I’ll try to review some of those.

First of all there is something that can be called goodwill monetisation. In this case somebody tries to convert existing reputation, connections and such for instant financial gain even if that leads to ruining them (and the profit they generated) for rather meagre compensation. The simplest example is any company selling its customer base data to e.g. advertisers (large corporations can get off with it by being monopolists and not having much reputation to lose in the first place). But the biggest beautifullest example of it is USA as a state. Back in the day I wrote a post about it that USA got as successful as it was until quite recently because it created a field and rules for everybody to play by. But certain somebody decided that USA is too important so if it will demand money from other trade participants just because and they will comply—not understanding that the trade was flowing to the USA because it was an attractive country for that and his decisions make it less attractive country and in the future others would think twice before doing any business with it (a small personal example: I haven’t bought anything USian in 2025 even if I had done that before occasionally and I’m not sure if I’ll ever consider it again). Similar acts regarding other things (like trying to leverage NATO to act as its personal army as well as buying more USian weapons just because; all while demonstrating how unwise it is to rely on them or—as Switzerland has recently learned—even to get what you paid for). I’m pretty sure more examples will keep providing themselves, all I can say that all trust-based systems would have the same effect (e.g. try to cash in 10% of total company stock or “crypto”coins emission in a short time and their price will drop even during such transactions and probably will remain low for a long time after).

Then there’s management. By that I mean not merely a swear-word (equivalent to being stupid and greedy sociopaths) but the acts of destroying company in the long term for short-term benefits (resulting in bonus for the manager, the only important thing here). You know, outsourcing production overseas, laying off staff just to show certain level of expenses cut (even if you have to hire those people back next quarter and not all of them will agree to come back), rushing production and cutting corners and so on and so on. Probably you can point at any large company for that but the best example IMO is Boeing with its catchy slogan “if it’s Boeing I’m not going”. Usually the companies have some robustness so when the aftermath of management starts to show it’s too late to save them (of course there are small miracles like Lou Gerstner and RedHat but in general don’t bet on it). I heard that vestments and options are supposed to fix that but I suspect it won’t work as good while there are no repercussions for doing badly (i.e. if you screwed up your current company you’ll still get your precious metal parachute and get employed by another company—unless you need a presidential pardon to remain in business of course).

There’s also general stupidity best illustrated by this joke: Security arrested a man who tried to smuggle uranium in his underpants. They ask him why did he do that and does he realise he won’t have any children now? To that he replied that he’d get enough money to leave to his grandchildren. This case is nothing new and happens every day to many individuals (you probably can name a couple of examples yourself). And if you wonder how it’s different from the previous two cases, it’s the nuances: in goodwill monetisation you kill the hen laying eggs for making a pillow (not even a soup!), management usually destroys somebody else’s property for their personal gains, here it is exercising a plan without thinking about consequences while those consequences backfire and make the original goal unachievable.

Another case can be called “a bull in a china shop”. Here some entity is so envious about money going past them that it makes a desperate grab for them, making things worse for everybody. Since this is a blog (occasionally) about multimedia, the most appropriate example would be codec licensing. The idea was simple: a committee creates a codec from the best technologies submitted by different entities (companies, universities, Fraunhofer-Gesellschaft etc), then everybody enjoys the best technology for a reasonable fee, which is used to pay the original creators—everybody wins. But then not just the patent holders involved in the codec creation got stupid greedy but completely unrelated entities as well, so now we have at least three patent pools for H.EVC and nobody is willing to use VVC at all (maybe except Brazil). So of course it’s the best time to raise fees on H.264.

And of course it gets better! The natural reaction to such thing is to move away from patented stuff—which Alliance for Open Media did. So the predators moved after them. Even before AOMedia there was VP8 and Nokia with its famous patent that apparently covered any modern video codec. And there’s D*lby, definitely not a nice company (fun fact: unless I’m mistaken, when this originally British company moved to USA, D*lby Labs Licensing Corp got incorporated earlier than D*lby Labs Inc.). For a long time it was known for its cinematic equipment and, more importantly, for charging significantly more money for its ATSC A/52B codec than others for their codecs combined (I don’t have official numbers of course but I heard that while H.264+AAC decoding cost ten cents or less per unit, for ETSI 102 366 you had to pay over a dollar per unit; plus a custom license deal—in other words, it’s better not to deal with their stuff at all). With their successor codec not being that popular (despite so many wonderful DRC modes!), D*lby Vision not being too widespread either, their only winning move was apparently to go after the competition. So first they sued two companies here in Germany for using AOMedia OACv0 (aka Opus) which allegedly violates their patents, later they went after other companies for using AV1 which allegedly violates their patents (AOMedia definitely needs to produce more stuff to get sued after, IAMF might be not enough). I don’t know if D*lby wants a one-time ransompayment, or a constant stream of royalties for a free stuff (don’t ask me how it would work, they don’t care either), or to be bought by large AOMedia member(s). All I can say is that this situation definitely makes it worse for everybody not even directly involved in it.

If you wonder if there are other types of stupid greediness being practiced, there definitely are—idiots can be very inventive after all. Meanwhile the (dis)honorary mention goes to “AI” companies for blending all of those flavours of stupid greed into one large slop.

What a time to be alive!