Archive for the ‘Various Video Codecs’ Category

A look at more formats

Tuesday, April 16th, 2024

As I mentioned in a recent post, I’ve tried using discmaster.textfiles.com to search for more exotic multimedia formats. Here’s a short report of the found formats of some interest.

I mostly looked at the formats listed as video but that could not be decoded. Or audio-only AVIs—some of them are really audio-only, others feature a video stream that was not recognized.

So, what I’ve found:

  • DK Animation—this turned out to be a simple RLE-based animation+sound format used in some interactive encyclopedias. It was rather easy to figure out format from the samples, while executables were rather useless (due to program design it’s next to impossible to locate the code responsible for animation handling without decompiling all of it;
  • PI-Video (used in a different set of interactive encyclopedias) turned out to be a simple quadtree-based codec (frame is divided into square tiles, each tile can be skipped, filled with one colour, subdivided further or, in case of 4×4 tile, filled with raw image). Additionally pixel values may be further compressed with LZW. That proved out to be the most interesting format out of the bunch;
  • there were a bunch of RIFF and IFF-based formats, often without a known decoder. Maybe I’ll look at them one day when I feel really desperate, but not today;
  • ESCP codec is a variation of Escape 130. After I changed FOURCC to the recognized E130 the file was somewhat decoded: there were countless decoding errors, visual garbage yet it produced almost perfect complex parts of the frame as well. I suspect it may have e.g. an additional field or two or some small bitstream tweaks;
  • and a special mention to tmot FOURCC which of course turned out to be TrueMotion 1 video.

It’s random finds like this that make life a bit less dull.

A quick look at Gold Disk Animation

Wednesday, April 10th, 2024

Since I’m still looking for a thing to reverse engineer, I decided to see if this file service at discmaster.textfiles.com could offer some exotic formats. And indeed it can.

So there’s this AWI or AWM file format (it’s called AWI in the decoder libraries but the files I could find have extension .awm).

So this is more of a presentation format which has nested structure with chunk names in capital letters containing other chunks while chunks (i.e. everything is contained inside GDAW chunk, actual assets like PALT or BKGD stored inside RSRC chunk and presentation scenario probably being stored in SEEN chunk) with lowercase names having various specific data attached to them (e.g. psnm is followed by Pascal-style string with asset name, tzim contains compressed image data and nndn marks end of object data).

I have not looked too deep into it (no idea how the scenario works or what are the various object parameters) but here’s some information about resource types:

  • RLE4—a 16-colour RLE-compressed BMP, I presume;
  • RLE8—ditto but with 256 colours;
  • PALT—some global palette (but images still have their own);
  • BKGD—DCL-compressed background BMP;
  • ACTR—DCL-compressed BMP used as sprite;
  • WIPE—transition effect definition;
  • SWND—DCL-compressed WAV.

The most curious thing for me is that it used Pkware Data Compression Library to compress data. And while WAV files are compressed in one piece, BMPs are compressed as separate chunks—14-byte BMP header, 40-byte DIB header, palette, and image data. I think this was a conscious decision from the format and tool designers (in order to improve compression ratio a bit).

I’ll probably try to dig some more details and document it but the most interesting part for me (i.e. figuring out its outstanding design features) is done already.

A look at Winnov Pyramid codec

Friday, October 27th, 2023

Since I still have nothing better to do, I decided to take a look at some old codec. Apparently I tried looking at it before and abandoned it because Ghidra cannot disassemble its code properly let alone decompile. I think this is a recurring theme with the old 16-bit code, especially the one reading data using non-standard segments.

So I located Sourcer, the best disassembler of the era (that seems to be abandonware nowadays but I cannot swear on that) and used it to disassemble the binary, referring to Ghidra database to locate the functions I should care about. It is not that much fun to translate assembly by hand but at least there was not that much of it.

The codec itself turned out to be a moderately complex DPCM codec compressing 7-bit YUV 4:1:1 data using per-frame codebook and not so trivial delta compression. Codebooks contain pair of delta values calculated depending on number of bits per delta. The data is coded per plane with prediction running continuously for all pixels in the plane:

  // before decoding data
  (delta0, delta1) = get_code();
  pprev = 64;
  prev = 64 + delta0;
  pdelta = delta1;
  for each pixel pair {
    (delta0, delta1) = get_code();
    delta = ((prev + delta0 - pprev) >> 3) + pdelta;
    pix0 = clip_uint8((prev + delta) * 2);
    pix1 = clip_uint8((prev - delta) * 2);
    pprev = prev;
    prev += delta0;
    pdelta = delta1;
  }

Normally such codecs would not bother to generate a codebook for the specific delta size or use something more complex than pix = prev + delta; so this was a rather interesting codec to look at. Hopefully there will be more of interesting formats to study even if sometimes I get the feeling that all undiscovered formats are either trivial or rip-offs of some standard.

Looking at Motion Pixels

Tuesday, October 24th, 2023

There is this very Sirius (or Sirius Publishing, more precisely) family of video codecs (plus one container format) apparently developed by two guys (who like to spam their name even in junk sections of AVI files). Also initially it had its own container format but later they’ve started to target AVI.

Another peculiarity of this format is that initially it targeted games but later was also used as a crappy Video CD alternative.

Back in the day Gregory Montoir REd the original game format for one of the game engine re-implementations he’s famous for and donated the code to FFmpeg as well. Since that time I was curious whether that code can be adapted to play MVI1 and MVI2 as well but the codec itself turned me off.

The codec itself is perverted, both in code and interface. Also it’s inherently interlaced. Normally video codecs in AVI can be recognized by their FOURCC and pass additional configuration parameters in the additional header data. Here they decided to use half of FOURCC to pass configuration flags to the codec and use stream handler FOURCC (that most apps ignore) to tell their decoder should be used to handle it. This alone would make me want to not support it ever, but the binary specification is worse.

Looks like the code consists mostly of handwritten assembly because I don’t know which compiler may generate this madness. There are many versions of the codecs, most of them are 16-bit and the 32-bit version is no better. For starters, it uses segments.

Not so many people remember DOS times and its memory models, even fewer remember them fondly. And almost nobody remembers that in 32-bit mode you can also use FS and GS registers to have custom addressing modes. Well, this codec uses them: it sets FS to the context pointer so context fields are accessed as mov EAX, dword ptr[1A8h] while global variables are accessed as mov EAX, dword ptr GS:[SYM] and of course no decompiler likes that. I was able to work around it in Ghidra by creating a new segment starting from zero but it’s still annoying.

Another thing is (ab)using registers to the full extent. Functions pass their parameters implicitly in the registers, using stack only to save those values before a loop or form a list of rectangles to process. And of course it uses this annoying (for the decompiler) feature as using the same register for two loop counter (e.g. top byte for the outer loop and low byte for the inner loop). As the result Ghidra can’t decompile it properly or even ignores whole blocks of the code because to its belief they can’t be invoked—and it’s still better than decompiling 16-bit version of MVI1 which made decompiler commit suicide. As the result some functions are easier to hand-translate from the assembly.

In either case looks like despite all the improvements it remains about the same as the initial version: data is coded as 5-bit YUV internally and stored using Huffman codes, quantisation and change maps (rectangles that tell which areas to update/fill). MVI2 can use ten different frame decoding modes that differ in how the deltas are coded but essentially it remains the same. They have not even gotten to introducing a proper motion compensation it seems.

So, now I’ve had a good long look at the codec, found nothing interesting there that was not known before and can forget about it. If only there was something more interesting to look at…

Looking at NUVision

Friday, July 7th, 2023

Since I still have nothing better to do, I decided to look at some obscure video codec I had laying around for really long time. And it turned out to be simple yet rather original.

Unlike many other codecs, this one codes YUV 4:2:2 (at least it looks like that) line per line in chunks of 24, 16 or 8 elements (essentially 24-pixel chunks plus shorter tail and the line width being a multiple of eight). Each chunk can be coded using one of four modes (leave it as is, decode and apply delta, copy chunk from the previous line with or without delta). Delta values are coded as chunk quantiser, delta values (-q, 0, q, escape) plus escape values. And since those mode/delta values can fit into two bits, they’re packed together into 16-bit words.

If you look at it, there’s nothing really inventive: short slices are present in many codecs, lossy delta coding is common too. But together they create a combination that I’ve not seen anywhere else. And that’s why looking at older codecs is pleasing: beside seeing rip-offs of the standard codecs sometimes you also encounter such original ideas as well.

One last experiment with Cinepak encoder

Saturday, June 17th, 2023

I’ve remembered that back in the day there was an encoder for RoQ format (the format that uses a codebook with 2×2 YUV vectors, what a coincidence!) called Switchblade and it was using NeuQuant before it was integrated into FFmpeg where it started to use ELBG. So I decided to give it a try.

If you have forgotten, NeuQuant is an application of Kohonen neural network to the task of generating palette for an image. I’ve implemented that kind already so I tried my hoof at adapting it for a larger vector size. Good thing: it works and it’s reasonably fast (2-3 times slower than median cut, faster than partitioned ELBG—and that’s the code that uses doubles for the majority of its calculations). Bad thing: the result quality is mediocre. The results obviously can be improved by adjusting various factors (wait, am I talking about neural network or string theory?) and changing the pseudo-random order in which the candidates are sampled but I don’t feel enthusiastic about tweaking all those parameters and see which ones work good for the widest selection of video sequences.

So I’m drawing a line here. It was a quick and failed experiment, I should find something better to do.

Yet another MOV quirk

Thursday, June 15th, 2023

Since I had nothing better to do I was browsing FMV games at archive.org and in one of them I found rather peculiar sample: avconv has wrong palette for the first half of it and nihav-tool has wrong palette in the second half of the clip. And I thought that MOV is not supposed to have palette changes at all.

It turned out they used a multiple sample descriptors trick: it’s possible to provide several codec descriptions for one track and use one or another for different frames. That file has two descriptors for the video track with different palettes. Mystery solved.

And it also solved another mystery with a different file from that game where some frames are not decoded properly. It turned out that it also has two sample descriptors for the video track: one is A**le Graphics and another one is Cinepak.

Back in the day I ranted that MOV is too flexible and this proves once again how true that is. Good thing I don’t have to care about supporting such files properly.

Further Cinepak experiments

Monday, June 5th, 2023

For having nothing better to do I kept experimenting with Cinepak encoder.

I considered implementing some variant of codebook decomposition scheme suggested by Tomas in the comments to the previous post but I’m still not sure if I should bother even if it looks promising. So I tried the old thresholds-based scheme instead.

And what do you know, it speeds things up considerably: my usual test sample gets encoded in 27-35 seconds (depending on thresholds) instead of 44 seconds in the usual mode. But since I don’t know what would be good thresholds I did the opposite and added a refinement mode: after deciding which codebook to use for which block I re-generate codebook using only those blocks that belong to it. Of course it increases processing time, for example that file it takes 75 seconds to encode with refinement—which is still 70% more time but still less than double (for comparison, in full ELBG mode it’s an increase from about 160 seconds to 270 seconds).

So by rough estimate selecting only relevant blocks for codebook generation shaves 20-40% off the encoding time. And splitting data into partitions and generating a codebook by parts made the process about three times faster. I suspect that with a proper approach to clustering vector quantisation can be made two-three times faster but I don’t think I want to experiment with that. I should call it a day and move to something else instead.

Quick experiments with Cinepak encoder vector quantisation

Saturday, June 3rd, 2023

Out of curiosity I decided to check how partitioning input before creating a codebook affects encoding speed. So I’ve added a mode to Cinepak encoder that partitions vectors by luma variance and creates a part of common codebook just for them. The other two modes are median cut (the simplest one but also with mediocre output) and ELBG (that uses median cut to create the initial codebook—also if it’s not full that means we have all possible entries and do not need to perform ELBG at all).

Here are rough results on encoding several different files (and using different number of strips): median cut worked for 11-14 seconds, ELBG took 110-160 seconds, new mode (I decided to call it fast) takes 43-62 seconds. I think even such approximate numbers speak for themselves. Also there’s an interesting side effect: because of the partitioning it tends to produce smaller codebooks overall.

And while we’re speaking about quantisation results, here’s the first frame of waterfall sample encoded in different modes:

median cut

fast

full ELBG

As you can see, median cut produces not so good images but maybe those artefacts will make people think about the original Cinepak more. Fast mode is much nicer but it still has some artefacts (just look at the left edge of the waterfall) but if you don’t pay too much attention it’s not much worse than full ELBG.

Are there ways to improve it even further? Definitely. For starters, the original encoder exploits the previous codebook to create a new one while my encoder always generates a new codebook from scratch (in theory I can skip median cut stage for inter strips but I suspect that ELBG will work much longer in this case). The second way is to fasten up the ELBG itself. From what I could see it spends most of the time determining to which cluster each of the points belong. By having some smarter structure (something like k-d tree and some caching to skip recalculating certain clusters altogether) it should be possible to speed it up in several times. Unfortunately in this case I value clarity more so I’ll leave it as is.

P.S. I may also try to see how using thresholds and block variance to decide its coding mode affects the speed and quality (as in this case we first decide how to code blocks and then form codebooks for them instead of forming codebooks first and then deciding which mode suits the current block better; and in this case we’ll have smaller sets to make codebooks from too). But I may do something different instead. Or nothing at all.

A quick glance at the original Cinepak encoder

Friday, May 26th, 2023

Since I don’t have anything to do with NihAV at the time (beside two major tasks that always make me think about doing anything else but them) I decided to look at what tricks did the original Cinepak encoder have.

Apparently it has essentially three settings: interval between key frames (with maximum and minimum values), temporal/spatial quality (for deciding which kinds of coding should be used) and neighbour radius (probably for merging close enough values before actual codebook is calculated).

Skip blocks are decided by sum of squared differences being smaller than the threshold (calculated from the time quality); V1/V4 coding is decided by calculating sum of 2×2 sub-block variances and comparing it against the threshold (calculated from spatial quality).

Codebook creation is done by grouping all blocks into five bins (by logarithm of the variance) and trying to calculate a smaller codebook for each bin independently (so together they’ll make up the full 256-entry codebook).

Overall even if I’m not going to copy that approach it was still interesting to look at.