Archive for the ‘Various Video Codecs’ Category

ARMovie: Moving Lines codec

Monday, April 22nd, 2024

While I’m trying to write a decent demuxer for the format, here’s the description of the first codec for it (I’ll postpone the rest until I implement a demuxer and a decoder for this one).

So, Moving Lines (aka codec 1) works on the usual 15-bit pixels and packs data into 16-bit words (sadly Ghidra fails to realise that LDR loads 16-bit data on ARMv4 but I’ll manage). Bits are read LSB first.

Here are patterns for the opcodes: first (low) bit signals a special code and when unset the rest of the word is a raw pixel. For special codes it’s easier to look at the top bits first though.

  • 0x0001..0x8FFF—copy the amount of pixels stored in the next 6 bits using displacement table (from -8,-8 to 8,8 excluding 0,0) with table index stored in the following 9 bits;
  • 0x9001..0xE5FF—copy data from the already decoded part of current frame, next 6 bits are amount of pixels to copy, the following 8 bits code the displacement (from -9,-9 to 9,0);
  • 0xE601—end of frame marker
  • 0xE603..0xEFFF—run series, 10 bits code run length minus one and the next codeword is pixel value;
  • 0xF001..0xF7FF—skip series, 10 bits code skip length minus one;
  • 0xF801..0xFFFF—raw data values, 10 bits code code number of values, the following codewords contain packed 15-bit values. In the end bitstream is aligned to 16-bit boundary.

Since video data is usually clumped together in large chunks you need to keep decoding it until you encounter end-of frame marker (and then data for the following frame starts).

That’s all for now, hopefully more will come soon.

Another quick look at two Amiga formats

Wednesday, April 17th, 2024

Considering the comment under my previous post, I had a better incentive to look at more formats. So here are two of them.

Since both are Amiga formats, here’s a summary picture:

First, Zoetrope Animation. The same ADF image containing BACKFLIP.rif has also the player sources (C + M68K assembly with comments). I have not studied them too closely but the main peculiarity of the format is that like its namesake it operates on image columns, employing simple RLE (and skip for inter frames). Also while it calls its format RIFF it has rather different structure (and may even pre-date more common RIFF by several years). And it seems to support different bitplane modes, as well as HAM.

Then, IFF VAXL. This is a conventional IFF with video properties inside VXHD chunk, PAD0 for padding, TMCD being always the same (is that a time between frames?), COLS containing something looking like some 12-bit palette entries, BMAP being of a constant size, SAMP probably containing PCM audio. But what about image data? Since it has a constant size fit for a 6-bit uncompressed image (and has 6 in the properties), I suspect it’s just Rohschinken uncompressed HAM (and trying to decode it as such results in a recognizable picture).

That was surprisingly easy but there are many more formats to look at.

A look at more formats

Tuesday, April 16th, 2024

As I mentioned in a recent post, I’ve tried using discmaster.textfiles.com to search for more exotic multimedia formats. Here’s a short report of the found formats of some interest.

I mostly looked at the formats listed as video but that could not be decoded. Or audio-only AVIs—some of them are really audio-only, others feature a video stream that was not recognized.

So, what I’ve found:

  • DK Animation—this turned out to be a simple RLE-based animation+sound format used in some interactive encyclopedias. It was rather easy to figure out format from the samples, while executables were rather useless (due to program design it’s next to impossible to locate the code responsible for animation handling without decompiling all of it;
  • PI-Video (used in a different set of interactive encyclopedias) turned out to be a simple quadtree-based codec (frame is divided into square tiles, each tile can be skipped, filled with one colour, subdivided further or, in case of 4×4 tile, filled with raw image). Additionally pixel values may be further compressed with LZW. That proved out to be the most interesting format out of the bunch;
  • there were a bunch of RIFF and IFF-based formats, often without a known decoder. Maybe I’ll look at them one day when I feel really desperate, but not today;
  • ESCP codec is a variation of Escape 130. After I changed FOURCC to the recognized E130 the file was somewhat decoded: there were countless decoding errors, visual garbage yet it produced almost perfect complex parts of the frame as well. I suspect it may have e.g. an additional field or two or some small bitstream tweaks;
  • and a special mention to tmot FOURCC which of course turned out to be TrueMotion 1 video.

It’s random finds like this that make life a bit less dull.

A quick look at Gold Disk Animation

Wednesday, April 10th, 2024

Since I’m still looking for a thing to reverse engineer, I decided to see if this file service at discmaster.textfiles.com could offer some exotic formats. And indeed it can.

So there’s this AWI or AWM file format (it’s called AWI in the decoder libraries but the files I could find have extension .awm).

So this is more of a presentation format which has nested structure with chunk names in capital letters containing other chunks while chunks (i.e. everything is contained inside GDAW chunk, actual assets like PALT or BKGD stored inside RSRC chunk and presentation scenario probably being stored in SEEN chunk) with lowercase names having various specific data attached to them (e.g. psnm is followed by Pascal-style string with asset name, tzim contains compressed image data and nndn marks end of object data).

I have not looked too deep into it (no idea how the scenario works or what are the various object parameters) but here’s some information about resource types:

  • RLE4—a 16-colour RLE-compressed BMP, I presume;
  • RLE8—ditto but with 256 colours;
  • PALT—some global palette (but images still have their own);
  • BKGD—DCL-compressed background BMP;
  • ACTR—DCL-compressed BMP used as sprite;
  • WIPE—transition effect definition;
  • SWND—DCL-compressed WAV.

The most curious thing for me is that it used Pkware Data Compression Library to compress data. And while WAV files are compressed in one piece, BMPs are compressed as separate chunks—14-byte BMP header, 40-byte DIB header, palette, and image data. I think this was a conscious decision from the format and tool designers (in order to improve compression ratio a bit).

I’ll probably try to dig some more details and document it but the most interesting part for me (i.e. figuring out its outstanding design features) is done already.

A look at Winnov Pyramid codec

Friday, October 27th, 2023

Since I still have nothing better to do, I decided to take a look at some old codec. Apparently I tried looking at it before and abandoned it because Ghidra cannot disassemble its code properly let alone decompile. I think this is a recurring theme with the old 16-bit code, especially the one reading data using non-standard segments.

So I located Sourcer, the best disassembler of the era (that seems to be abandonware nowadays but I cannot swear on that) and used it to disassemble the binary, referring to Ghidra database to locate the functions I should care about. It is not that much fun to translate assembly by hand but at least there was not that much of it.

The codec itself turned out to be a moderately complex DPCM codec compressing 7-bit YUV 4:1:1 data using per-frame codebook and not so trivial delta compression. Codebooks contain pair of delta values calculated depending on number of bits per delta. The data is coded per plane with prediction running continuously for all pixels in the plane:

  // before decoding data
  (delta0, delta1) = get_code();
  pprev = 64;
  prev = 64 + delta0;
  pdelta = delta1;
  for each pixel pair {
    (delta0, delta1) = get_code();
    delta = ((prev + delta0 - pprev) >> 3) + pdelta;
    pix0 = clip_uint8((prev + delta) * 2);
    pix1 = clip_uint8((prev - delta) * 2);
    pprev = prev;
    prev += delta0;
    pdelta = delta1;
  }

Normally such codecs would not bother to generate a codebook for the specific delta size or use something more complex than pix = prev + delta; so this was a rather interesting codec to look at. Hopefully there will be more of interesting formats to study even if sometimes I get the feeling that all undiscovered formats are either trivial or rip-offs of some standard.

Looking at Motion Pixels

Tuesday, October 24th, 2023

There is this very Sirius (or Sirius Publishing, more precisely) family of video codecs (plus one container format) apparently developed by two guys (who like to spam their name even in junk sections of AVI files). Also initially it had its own container format but later they’ve started to target AVI.

Another peculiarity of this format is that initially it targeted games but later was also used as a crappy Video CD alternative.

Back in the day Gregory Montoir REd the original game format for one of the game engine re-implementations he’s famous for and donated the code to FFmpeg as well. Since that time I was curious whether that code can be adapted to play MVI1 and MVI2 as well but the codec itself turned me off.

The codec itself is perverted, both in code and interface. Also it’s inherently interlaced. Normally video codecs in AVI can be recognized by their FOURCC and pass additional configuration parameters in the additional header data. Here they decided to use half of FOURCC to pass configuration flags to the codec and use stream handler FOURCC (that most apps ignore) to tell their decoder should be used to handle it. This alone would make me want to not support it ever, but the binary specification is worse.

Looks like the code consists mostly of handwritten assembly because I don’t know which compiler may generate this madness. There are many versions of the codecs, most of them are 16-bit and the 32-bit version is no better. For starters, it uses segments.

Not so many people remember DOS times and its memory models, even fewer remember them fondly. And almost nobody remembers that in 32-bit mode you can also use FS and GS registers to have custom addressing modes. Well, this codec uses them: it sets FS to the context pointer so context fields are accessed as mov EAX, dword ptr[1A8h] while global variables are accessed as mov EAX, dword ptr GS:[SYM] and of course no decompiler likes that. I was able to work around it in Ghidra by creating a new segment starting from zero but it’s still annoying.

Another thing is (ab)using registers to the full extent. Functions pass their parameters implicitly in the registers, using stack only to save those values before a loop or form a list of rectangles to process. And of course it uses this annoying (for the decompiler) feature as using the same register for two loop counter (e.g. top byte for the outer loop and low byte for the inner loop). As the result Ghidra can’t decompile it properly or even ignores whole blocks of the code because to its belief they can’t be invoked—and it’s still better than decompiling 16-bit version of MVI1 which made decompiler commit suicide. As the result some functions are easier to hand-translate from the assembly.

In either case looks like despite all the improvements it remains about the same as the initial version: data is coded as 5-bit YUV internally and stored using Huffman codes, quantisation and change maps (rectangles that tell which areas to update/fill). MVI2 can use ten different frame decoding modes that differ in how the deltas are coded but essentially it remains the same. They have not even gotten to introducing a proper motion compensation it seems.

So, now I’ve had a good long look at the codec, found nothing interesting there that was not known before and can forget about it. If only there was something more interesting to look at…

Looking at NUVision

Friday, July 7th, 2023

Since I still have nothing better to do, I decided to look at some obscure video codec I had laying around for really long time. And it turned out to be simple yet rather original.

Unlike many other codecs, this one codes YUV 4:2:2 (at least it looks like that) line per line in chunks of 24, 16 or 8 elements (essentially 24-pixel chunks plus shorter tail and the line width being a multiple of eight). Each chunk can be coded using one of four modes (leave it as is, decode and apply delta, copy chunk from the previous line with or without delta). Delta values are coded as chunk quantiser, delta values (-q, 0, q, escape) plus escape values. And since those mode/delta values can fit into two bits, they’re packed together into 16-bit words.

If you look at it, there’s nothing really inventive: short slices are present in many codecs, lossy delta coding is common too. But together they create a combination that I’ve not seen anywhere else. And that’s why looking at older codecs is pleasing: beside seeing rip-offs of the standard codecs sometimes you also encounter such original ideas as well.

One last experiment with Cinepak encoder

Saturday, June 17th, 2023

I’ve remembered that back in the day there was an encoder for RoQ format (the format that uses a codebook with 2×2 YUV vectors, what a coincidence!) called Switchblade and it was using NeuQuant before it was integrated into FFmpeg where it started to use ELBG. So I decided to give it a try.

If you have forgotten, NeuQuant is an application of Kohonen neural network to the task of generating palette for an image. I’ve implemented that kind already so I tried my hoof at adapting it for a larger vector size. Good thing: it works and it’s reasonably fast (2-3 times slower than median cut, faster than partitioned ELBG—and that’s the code that uses doubles for the majority of its calculations). Bad thing: the result quality is mediocre. The results obviously can be improved by adjusting various factors (wait, am I talking about neural network or string theory?) and changing the pseudo-random order in which the candidates are sampled but I don’t feel enthusiastic about tweaking all those parameters and see which ones work good for the widest selection of video sequences.

So I’m drawing a line here. It was a quick and failed experiment, I should find something better to do.

Yet another MOV quirk

Thursday, June 15th, 2023

Since I had nothing better to do I was browsing FMV games at archive.org and in one of them I found rather peculiar sample: avconv has wrong palette for the first half of it and nihav-tool has wrong palette in the second half of the clip. And I thought that MOV is not supposed to have palette changes at all.

It turned out they used a multiple sample descriptors trick: it’s possible to provide several codec descriptions for one track and use one or another for different frames. That file has two descriptors for the video track with different palettes. Mystery solved.

And it also solved another mystery with a different file from that game where some frames are not decoded properly. It turned out that it also has two sample descriptors for the video track: one is A**le Graphics and another one is Cinepak.

Back in the day I ranted that MOV is too flexible and this proves once again how true that is. Good thing I don’t have to care about supporting such files properly.

Further Cinepak experiments

Monday, June 5th, 2023

For having nothing better to do I kept experimenting with Cinepak encoder.

I considered implementing some variant of codebook decomposition scheme suggested by Tomas in the comments to the previous post but I’m still not sure if I should bother even if it looks promising. So I tried the old thresholds-based scheme instead.

And what do you know, it speeds things up considerably: my usual test sample gets encoded in 27-35 seconds (depending on thresholds) instead of 44 seconds in the usual mode. But since I don’t know what would be good thresholds I did the opposite and added a refinement mode: after deciding which codebook to use for which block I re-generate codebook using only those blocks that belong to it. Of course it increases processing time, for example that file it takes 75 seconds to encode with refinement—which is still 70% more time but still less than double (for comparison, in full ELBG mode it’s an increase from about 160 seconds to 270 seconds).

So by rough estimate selecting only relevant blocks for codebook generation shaves 20-40% off the encoding time. And splitting data into partitions and generating a codebook by parts made the process about three times faster. I suspect that with a proper approach to clustering vector quantisation can be made two-three times faster but I don’t think I want to experiment with that. I should call it a day and move to something else instead.