MPEG-4 ASP: done for now

October 15th, 2024

In my last post I mentioned I need to deal with MP3 in AVI and multi-threaded decoding. The former turned out to be a simple bug (I should’ve not trusted AVI header reporting 12-bit audio), and I gave up on the latter.

The main reason for that is what seems to be the main contribution of MPEG to the world of video coding, namely B-frames. While the idea behind them is reasonable (to code scene transitions or smooth movements as an interpolation between two keyframes), practical implementation brings headaches because those frames are coded in an order different from the display order (after all, you can’t interpolate between two frames if you haven’t decoded both of them). And of course it got worse in H.264 and later codecs where B-frames can reference other B-frames so you need to code information about the frame structure (references and how to update them).

And the problem with MPEG-4 ASP is that while it can have B-frames, its popularity it tied more to AVI container which lacks means to signal frame reordering (fun fact: the MPEG-4 ASP video files in MOV that I have would be perfect candidates for B-frames but lack them entirely). Of course later there other containers gaining popularity like Matroska or OGM (or even MP4 occasionally) but the gilded age seems to be tied to AVI. And of course that created difficulties.

If you have I- and P-frames only, there’s nothing to care about—but multi-threading won’t be that effective either. Newer implementations (Xvid 1.3.7 is rather fresh BTW) output B-frames as is so good luck knowing that in advance and performing reorder. In this case I see if the coded timebase is the same as the one reported by the container and simply re-assign timestamps from the bitstream (and if this does not work—well, tough luck). But there was a funnier intermediate solution with one frame containing data for both P- and B-frame and the following frame being a skip frame, so a decoder could replace it with an appropriate frame. This reminds of Indeo 4 which performed the same trick. And making that a multi-threaded decoding would be a mess requiring either saving frame data and scheduling it for later decoding or scheduling both frames and then trying to tie it to the upcoming frame decoding request. And playing back typical video takes about 20% of CPU load…

Thus I’ve committed what I find to be good enough for my needs and I shall forget about it—at least until some decoding artefact will annoy me enough. There’s more boring and unremarkable stuff I want to do on NihAV, working on this decoder reminded me that it can always be worse (or uglier).

P.S. For some reason repository cloning or updating from git.nihav.org still does not work (but the web interface is fine). I’ve reported the problem and hopefully it will be resolved soon. I suspect that the provider blocked it because of too many synchronisation requests from other sites trying to mirror the repositories. In either case I’m still grateful for the hosting.

Woes of implementing MPEG-4 ASP decoder

October 11th, 2024

So, for a last month or even more (it feels like an eternity anyway) I was mostly trying to force myself to write a MPEG-4 ASP decoder. Why? Because I still have some content around that I’d like to play with my own player. Of course I have not implemented a lot of the features (nor am I going to do that) but even what I had to deal with made me write this rant.
Read the rest of this entry »

Another bunch of formats I’m not looking at

October 5th, 2024

I regularly look at the dexvert list of unsupported video formats to see if something curious comes up. About half of that list are formats supported by na_game_tool, maybe a third are animation systems (i.e. more like a script language telling how to compose and change external or internal resources), but the rest are formats that pique my curiosity. I’ve written about some of them (like Amiga formats I blogged about half a year ago or rather recent TealMovie) and today I’m going to mention some more.

First of all, AVS. That’s the third AVS format I’m hearing about. First there was AVS used in Creature Shock game, then there’s this Chinese MPEG-四 AVS (followed by AVS2 aka HEVS and AVS3 aka “VP9VVC by any other name…”). Apparently there’s another one, from early PC era. It seems to have been used by some ActionMedia cards with Indeo video compression formats (DVI PLV and DVI RTV, also PIC and JPEG are mentioned in the convertor) and audio (8-bit PCM or DVI ADPCM). There’s a special tool for converting AVS to AVI but good luck finding samples (I’ve found one, yulelog.avs, used in a demo). The format seems to be documented (as DVI format) but the codecs are not (beside RTV 2.0 aka Indeo 2). Maybe I’ll take another look at it one day…

Then, there’s a game called Music Chase. I found it by accident looking for Toon Boom Studio samples (which is an animation system, so not so interesting to look at). So what’s interesting about this game?

It looks like the game assets are divided into rooms, each having its own set of resources—usually some TBP files, some TMV files and a MID file or two. The first format is the standard BMP with compression method 21 (which is not). TMV files are ciuQmiTke MOV (i.e QuickTime format but with all values being little-endian now) and custom track handlers so while you can recognize audio track, video track is not so easy. Additionally the helper DLL is 16-bit code that makes Ghidra decompiler give up on almost every function. So maybe I’ll return to it when I’m seriously bored but not today.

Still, it’s nice to encounter such formats time from time.

Looking at Winamp codebase

October 4th, 2024

Breaking news from the Slowpoke News Channel™: a source code base for Winamp has been released (just last month). So it’s a good occasion to talk about it and what interesting (for me) things can be found in the third-party libraries.

I think I used the software a bit back in the day when MP3 was still rage (and there were CDs sold proudly featuring MP3 at 256kbps) and I still was using Windows (so around 1998-1999). I even briefly tried K-Jofol player (does anybody remember that?) but I didn’t see much point in that. About at that time I finally switched to Linux as my main OS and started using XMMS and XMMS2 (I actually met one of its developers at FOSDEM once—and saw a llama or two when I visited a zoo but that’s beside the point). Also there was a plugin for XMMS2 that added VQF support (again, nowadays hardly anybody remembers the format but it was an interesting alternative; luckily Vitor Sessak reverse engineered it eventually). But with the time I switched to MPlayer for playing music and nowadays I use my own player with my own decoders for the formats I care about (including MP3).

But I wanted to talk about the code, not about how little I care about the program.

First fun thing is that the source code release looks like somebody was lazy and thinking something similar to “let’s just drop what we have around and tell not to do much with it—it’ll create some hype for us”.

Second fun thing is that it fails to live up to the name. As it should be obvious, the name comes from AMP—one of the earliest practical MP3 decoders (the alternatives I can remember from those times were dist10 or the rather slow Fraunh*fer decoder). And of course WinAMP uses mpg123 for decoding instead (I vaguely remember that they switched the decoding engine to the disappointment of some users but I had no reason to care even back then).

But the main thing is that they’ve managed to do what Baidu failed to do—they’ve made VP5 decoder and VP6 codec open-source. Of course it may be removed later but for now the repository contains the library with the traditional Truemotion structure that has VP5 decoder as well as VP6 decoder and VP6 encoder. So those who wanted an open-source VP6 encoder—grab it while it’s still there (and I still have my own implementations for either of those things).

Out of curiosity I looked at the encoder and I was not impressed. It has more features (like two-pass encoding) and more refined rate control but it does not look that much better. I wonder what Peter Ross could say about it, being a developer of a popular and well-respected encoder for a codec with rather similar structure.

Overall, the code base looks like a mess with no clear structure, with most libraries shoved into one directory without any further attempt to separate them by purpose. But it does not matter as it was not intended for the large collaborative efforts and two or three programmers could live with it.

Still, let’s see if something good comes from this source release.

A quick look at Death Rally HAF

September 27th, 2024

I tried to look at it before but since the game uses that special 32-bit DOS extender (PMODE/W if anybody’s curious), I gave up. As I mentioned in one of previous posts, I need either to find a way to dump unpacked and properly loaded executable reliably or write such decompressor from the binary specification—and I’m not eager to do either of those things.

Luckily somebody bothered to decompile the engine in order to make game work on modern OSes without DOS emulation. I use the term “decompile” because a lot of the parts are slightly prettified direct translations of the assembly. Anyway, this can be improved and it probably work good enough already to make game fans rejoice.

Here’s the decompiled cutscene player and what follows is my understanding of the code given there.

So, HAF files start with 16-bit number of the frames followed by two tables, one byte per entry each. First table contains sound effects that should be played starting at that frame (or zero), second table gives frame duration at 70fps. Then actual frame data follows: 16-bit frame size, 768-byte palette and then LZW-compressed image. And looks like they lifted LZW compression from GIF as it’s not only the basic parameters of LZW compression being the same (dictionary size limited to 4096, having restart code and arbitrary initial number of bits) but also LZW data is packed into chunks prefixed with its size. I can’t outright remember any other format doing that—or having a need to do that when total data size is known beforehand.

Maybe I’ll add support for this format to na_game_tool eventually but I’ll find something else to do for now.

A cursory glance at TealMovie

September 23rd, 2024

Apparently there’s such format for Palm so one could play videos on such devices as well (PocketPC devices got so powerful that they could even play 320×240 MPEG-4 ASP videos—provided you used some good opensource player like TCPMP and not something like VLC). Since there’s a Win32 player for it, I decided to look at the format and was slightly disappointed.

I expected it to be more akin to GBA formats that use vector quantisation (I looked at them in this post; e.g. Palm Treo should have comparable or better performance) but instead I got something reminding me of SMUSH of all things.

The format turned out to use palette and transmit frames either in raw form or split into 8×8 blocks and using various opcodes to code them. Most of the opcodes are used to signal that image should be copied from a previous frame with a fixed offset (there are 225 offsets for motion vectors in (-16,-16)..(13,13) range), fill block with single colour (either an arbitrary one or one of four provided in the frame header), skip block, fill with two colours using one of 256 predefined patterns, or split into smaller sub-blocks (and again down to 2×2 sub-sub-blocks) and update all/some of them. And there’s an opcode to code a run of repeated opcodes. If you don’t immediately think about SMUSH codec 47 then I don’t know what it reminds you of.

And of course it supported audio, which followed video part and was either 8-bit PCM or IMA ADPCM.

Overall, I believe it was enough to provide full-screen 160×160 video experience at 15 FPS using the handheld’s over 8000kHZ DragonBall CPU; I still wonder if having a direct 15-bit RGB format would make more sense.

Announcing na_game_tool 0.2.0

September 14th, 2024

As hinted in my previous posts, I’ve released na_game_tool version 0.2.0. Of course it is available at its dedicated page (and now there’s a link to that page in the “Multimedia Projects” link category). Here I’d like to talk about what’s changed in it and in NihAV.

Probably the most important part is a support of a dozen new video formats—that’s the main task the tool should do after all. I’m not going to duplicate the changelog already presented at its page so those zero people who care may look there. But in the same time it’s worth nothing that I’ve move support for three obscure formats from NihAV (i.e. adapted the code from there and removed it from nihav-game crate). After all, I added support for some game formats there in the first place because I had no other place for that but I did that mainly to test whether I understood their workings right. Now as na_game_tool exists it’s much more appropriate place for such experiments. Maybe I’ll move more formats in the future but not those I care about (like Smacker or VMD) and I might still add new ones like game-specific AVI codec (like IPMA or KMVC).

Now I want to talk about less important technical details. One of the improvements I’m proud about is adding a better detection using regular expression on filename. It is described in a post about Nova Logic FMV formats and it solves a rather annoying problem of certain formats having different flavours that are impossible to detect from the file contents but the naming helps distinguish those.

Also I’ve fixed all spotted bugs in AVI output and added OpenDML support for writing large AVIs. This was needed because some Arxel Tribe games have 24-bit videos in 800×450 resolution and over a minute long so decoded raw AVIs exceed 2GB in certain cases (and intro and advertisement videos from Ur-Quan Masters approached 1GB limit, ending results in ~1.4GB AVI). Additionally I’ve added BMP format for image sequence writer (I don’t think anybody will use it but maybe it’ll come in handy one day for analysing frame palettes).

Having said that, I put the project on pause. As mentioned in the post dedicated to the first release of the tool, my intent is to make releases when it accumulates support for at least a dozen new formats (the ones moved from NihAV do not really count)—and I don’t expect to encounter enough games with undiscovered video formats in the near future. Not that I’d refuse to work on it if it’s not too annoying (see my previous post on Psygnosis games for an example of those) but it may take a year or more to collect enough formats (plus some time to reverse engineer them), so I’d rather concentrate on other things like documenting already implemented formats.

A look on some of Psygnosis formats

September 12th, 2024

This British company had developed quite an amount of good games itself and published even more. As I mentioned in one of my previous posts, I’d like to take a look at some of those (after both seeing their games mentioned in dexvert unsupported formats and realizing that I’ve unknowingly added support for some other games to na_game_tool already). Though I’ve REd the most important format of them all, BMV, long time ago from ScummVM code (their code is still slightly prettied direct conversion of assembly into C++).

Alien Breed

Since this game comes from a wormy developer, its ANM files for the PC version in reality are mere AVIs using KMVC. They don’t play with my old decoder so I’ll have to look what’s different in that version.

Chronicles of the Sword

Fun thing is that most of the game resource files have .PC extension regardless of the type and content. But here I’ll talk about SEQ/*.PC of course.

Those files expose rudimentary chunk structure i.e. they have three short chunks with some basic metadata and frame sizes in the beginning but that’s all. The rest seems to be frame data starting with frame dimensions and image data size (if VGA palette follows it seems to be unaccounted for).

I’d like to take a closer look at it but it’s too annoying: the executable uses CauseWay DOS Extender with packed 32-bit part so I either need to learn how to dump it or RE the extender code to see how it unpacks the code (the trick I used for the next game didn’t work). In either case I’ll postpone it until I’m sufficiently bored and really out of ideas what to do.

Microcosm

This games uses the simplest yet still rather weird codec: scalable RLE.

What do I mean by that? It employs several methods of compression, all are rather simple variations on RLE, except that some have “masked update” opcode beside the usual run/copy/skip. That opcode is a bit mask telling which ones of the following 7 or 15 pixels should be updated with new values or left intact.

But you probably wondered more about the “scalable” part of the description. Well, that format actually codes four 80×200 images per frame (each one may use its own method) and then interleaves the columns. And if you’re not satisfied with the spatial scalability only, it has temporal scalability as well: every even frame is coded independently (i.e. frame 2 updates frame 0, frame 4 updates frame 2 while frame 3 updates frame 1 and so on).

Also it stores palette in BRG order for some reason, which I don’t remember seeing in any other codec—unlike 2-pixel RLE (IIRC I was surprised to see such approach in some ARMovie codecs but it turned out to be not that rare after all).

This format was surprisingly hard to reverse engineer not merely because of its five different coding methods but also because its binary specification is somewhat inaccessible. The executable uses Phar Lap DOS Extender with compressed extended part. I did not know how to approach it until eventually I managed to use dosbox-x debugging facilities and dump what turned out to be the unpacked 32-bit part, which I could later load in Ghidra as raw 32-bit binary and find proper decompression routines. No such luck with the previous game because it seems to load the code into XMS and I don’t know if it’s possible to dump that part of memory at all…

In either case this codec will be supported in na_game_tool 0.2.0 and I can finally start forgetting about it (but not before documenting it in The Wiki).

Novastorm

This game uses a rather curious approach to the video coding—it treats all blocks as being coded with 16 colours plus mask and during video decoding it updates current colours (often by reusing them from some previous block) and mask. And then the draw routine uses those per-block colours and mask to reconstruct it.

Of course you can have fewer colours but it’s an interesting approach. “You want fill block? Just replicate this colour sixteen times. You want two-colour block? Just copy/read two colour values and (optionally) update mask using just 1 bit per cell. And yes, some opcodes tell you to read the full mask for the block while others tell you to read first a mask that tells you for which block rows the mask is actually transmitted.

What about inter coding? The skip mode is simple: you have a fixed-size array of flags telling you which block is coded and for which operations should be skipped.

So it’s a bit hairy to implement and I left that for later.

WipEout

This one is somewhat funny—it uses IF chunked format (i.e. just two bytes per chunk name and 16-bit chunk size) and seems to pack frame data with an LZ77 algorithm before actual video decompression kicks in. The scheme seems to operate on nibbles as some of the opcodes are “fill 32-bit word of output with 0x44444444”, “XOR 32-bit word of output with 0x00F00000”, “replace top/bottom three nibbles of 32-bit word with 0x777” and so on. Also there’s a post-decoding image permutation step (and in case of one mode, it seems to use raw image that gets pre-permuted just to permute it back to the original state in the end; I haven’t looked too closely but it looks suspicious).

Another curious thing is that it has a table of 32-bit values transmitted at the beginning of each frame with several opcodes to copy that value to the output.

Overall, it’s an interesting codec and I’d like to have decoder for it implemented for the upcoming na_game_tool 0.2.0 release but it’s too hairy for that. So it also goes to my “what to do when I’m bored and can’t think of anything better to do” list.


As you can see, I have not done much but even a cursory glance at those formats show the variety of approaches to video coding that you cannot see in the modern codecs. And that’s why I like looking at old formats.

P.S. With the intended amount of decoders implemented I hope to finally release na_game_tool this weekend. There should be a follow-up post about it soon (and then probably just occasional rants for long time).

On over- and under-engineered codecs

September 10th, 2024

Since my last post got many comments (more than one counts as many here) about various codecs, I feel I need to clarify my views on what I see as over-engineered codec as well as under-engineered codec.

First of all, let’s define what does not make a codec an over-engineered one. The sheer number of features alone does not qualify: the codec may need those to fulfill its role—e.g. Bink Video had many different coding modes but this was necessary for coding mixed content (e.g. sharp text and smooth 3D models in the same picture); and the features may have been accumulating over time—just look at those H.26x codecs that went a long way, adding features at each revision to improve compression in certain use cases. Similarly it’s not the codec complexity per se either: simple methods can’t always give you the compression you may need. So what is it then?

Engineering in essence is a craft of solving a certain class of problems using practical approaches. Yes, it’s a bit vague but it shows the attitude, like in the famous joke about three professionals in a burning hotel: an engineer sees a fire extinguisher and uses it to put out fire with the minimum effort, a physicist sees a fire extinguisher, isolates a burning patch with it and studies a process of burning for a while, a mathematician sees a fire extinguisher, says that there’s a solution and goes to sleep.

Anyway, I can define an over- or under-engineered codec by its design effectiveness i.e. the amount of features and complexity introduced in relation to the achieved result as well as the target goal. Of course there’s rarely a perfect codec so I’ll use a simpler metric: a codec with several useless features (i.e. those that can be thrown out without hurting compression ratio or speed) will be called over-engineered and a codec which can be drastically improved without changing its overall design will be called under-engineered. For example, an RLE scheme that allows run/copy length of zero can be somewhat improved but it’s fine per se (and the decoder for it may be a bit faster this way); an RLE scheme that uses zero value as an escape with real operation length in the following byte is under-engineered—now you can code longer runs but if you add a constant to it you can code even longer runs and not waste half of the length on coding what can be coded without an escape value already; and an RLE scheme that allows coding the same run or copy in three different ways is over-engineered.

And that’s exactly why XCF is the most over-engineered format I’ve even seen. Among other things it has three ways to encode source offset with two bytes: signed X/Y offsets, signed 16-bit offset from the current position or an unsigned 16-bit offset from the start. And the videos come in either 320×200 or 320×240 size, so unless you have some weird wrap-around video you don’t need all those addressing modes (and actually no video I’ve tried had some of those modes used). Also since the data is not compressed further you can’t claim it improves compression. Speaking of which, I suspect that wasting additional bits on coding all those modes for every block in every frame negates any potential compression gains from specific modes. There are other decision of dubious usefulness there: implicit MV offsets (so short MVs are now in range -4,-4..11,11 for 8×8 blocks and -6,-6..9,9 for 4×4 sub-blocks), randomly chosen data sources for each mode, dedicated mode 37 is absolutely the same as mode 24 (fill plus masked update) and so on.

Of course there are more over-engineered codecs out there, I pointed at Indeo 4 as a good candidate in the comments and probably lots of lossless audio codecs qualify too. But my intent was to show what is really an over-engineered codec and why I consider XCF to be the worst offender among game video codecs.

As for under-engineered codecs, for the reasons stated above it’s not merely a simple codec, it’s a codec where a passerby can point out on a thing that can be improved without changing the rest of the codec. IMO the most fitting example is Sonic—an experimental lossy/lossless audio codec based on Bonk. Back in the day when we at libav discussed removing it, I actually tried evaluating it and ended with encoded files larger than the original. And I have strong suspicion that simply reverting coding method to the original Bonk or selecting some other sane method for residue coding would improve it greatly—there’s a reason why everybody uses Rice codes instead of Elias Gamma’. Another example would be MP3—there’s a rumour that FhG wanted it to be purely DCT-based (as AAC) but for patent holder’s consideration it had to keep QMF, making the resulting codec more complex but less effective.

P.S. The same principles are applicable to virtually everything, from e.g. multimedia containers and to the real world devices like cars or computers, but I’ll leave exploring those to the others.

The most overengineered game codec

September 7th, 2024

…as I’ve seen so far, of course, but the chances for it to keep this title are good.

I’m talking about XCF format (found in ACF files of Time Commando game developed by Adeline Software). For starters, the format allows various commands that are related to the engine (so XCF is used not merely for videos but also for the stage demo records containing e.g. commands to the camera and various other things.

But there’s nothing compared to the video codec design. The frame is split into three parts: fixed-size part with per-block 6-bit opcodes and two variable-length parts. I’d like to say that one contains colour values and another one stores masks and motion vectors, but in reality either kind of data may be read from either source and it’s not consistent even between modes (e.g. for four-colour pattern block both masks and colours are read from part 1 while for two-colour pattern block colours are read from part 2; it’s like they designed it on whatever registers were available in the decoding function at the moment). And for some reason they did not went further and left those frame parts uncompressed.

The real monstrosity though is the opcodes. As I’ve mentioned, they take six bits which means 64 different opcodes. And unlike many other codecs where you’d have half a dozen operations (copy from previous, fill with 1/2/16 colours and such) plus run count, here you have way more. Also it uses 8×8 block size which helps adding new modes. Here’s an abridged list:

  • raw block;
  • skip block;
  • motion block with one-byte motion vector (using nibbles for vector components), two-byte absolute offset, two-byte relative offset or two-byte motion vector;
  • motion block with each quarter using the same four modes;
  • single value fill block—and another mode for filling block quarters;
  • 2/4/8/16-colour pattern block;
  • 2/4/8-colour pattern block quarters;
  • special 4-colour pattern subblock coding mode where you can pick only one alternative option and only for half of the pixels;
  • block filled with mostly single colour except where mask tells to read a new value;
  • block with values coded as nibble-size differences to the base colour;
  • block coded as two base colours (high nibble only) plus low nibble of the current value;
  • block using RLE plus one of four scan modes (line scan, top-down scan, zigzag and inverted zigzag);
  • block using the same scan modes but sending only nibbles for the colour values (first nibble is for base colour high nibble, others are used to replace its low nibble).

And that’s not all, some modes (e.g. motion or fill) have refinement mode. E.g. opcodes 1-4 all mean skip block, but while opcode 1 means copy block from the same place in the previous frame and do nothing, opcode 2 means doing that and then replacing four pixels at arbitrary positions with new values, opcode 3 means the same but with eight pixels, and opcode 4 means using a mask telling which pixels should be replaced. It works the same way for other opcodes supporting it—first do the operation, then optionally update some of the pixels.

If you thought that peculiarities end there, you’d be wrong. Motion vectors are a displacement from the centre of the block instead of its top left corner as in any other video codec. And looks like while the format stored video dimensions, the decoder will work with any width as long as it’s 320 pixels.

I don’t envy future me who has to document it for The Wiki.

P.S. And what’s ironic is that the game was released between Little Big Adventure and its sequel, and while the former used custom format with FLI compression methods (and commands to play external audio), the latter switched to Smacker—a much simpler codec but apparently more effective.

P.P.S. After I reverse engineer a codec or two from games published by Psygnosis I should publish a new version of na_game_tool and switch to actually documenting all the formats it supports; and then do something completely different. At least the end is near.