Archive for the ‘Various Video Codecs’ Category

Looking at AOL ART format

Friday, November 15th, 2024

Since I have nothing better to do (beside some slight NihAV refactoring) and somebody told me about it, I decided to look at the format. Apparently back in the day The Multimedia Mike also attempted to research some information about it but I don’t think anything substantial came out from it.

Anyway, here’s what I everybody knows about it: as apparent from the name, it was developed by Johnson-Grace company, it combines a lot of different image compression methods and format—so apparently you can have slide show with an accompanying MIDI or speech, and it splits image into tiles and tries to compress them using whatever method fits best.

So, here are some additional details.

Audio codec is a rather common speech codec (LPC plus quality improvement post-filters) with one peculiarity: internally it decodes 16-bit samples yet outputs 8-bit PCM.

Slide show—I had just a cursory glance but it seems to mix various kinds of content with the slide show commands (like displaying next image) in the single file. Of course it’s last in my priorities list.

Image formats—now that’s where the real fun is. It handles about twenty different chunk types (even if most of them are useless and provide some image information at best) and recognizes (and skips) about the same amount too. I’m still struggling with the code but there seem to be three types of compression: LZ77-based lossless compression, lossy compression with the same coding for coefficients that probably uses wavelet coding, and another lossy compression (for palette-based images?). So far the only things I’m sure about is that it employs LZ77-based compression that reminds me of deflate with dynamic codebooks (but differs from it or DCL) and it seem to code signed coefficients while at it; the other thing is there are way too many functions for converting palette formats (usually between 24- and 32-bit RGB but quite often it’s between 32-bit RGB and 32-bit RGB in the same format but as an integer or an array of bytes).

In either case I’m in no hurry and can keep digging into it at my leisure.

MPEG-4 ASP: done for now

Tuesday, October 15th, 2024

In my last post I mentioned I need to deal with MP3 in AVI and multi-threaded decoding. The former turned out to be a simple bug (I should’ve not trusted AVI header reporting 12-bit audio), and I gave up on the latter.

The main reason for that is what seems to be the main contribution of MPEG to the world of video coding, namely B-frames. While the idea behind them is reasonable (to code scene transitions or smooth movements as an interpolation between two keyframes), practical implementation brings headaches because those frames are coded in an order different from the display order (after all, you can’t interpolate between two frames if you haven’t decoded both of them). And of course it got worse in H.264 and later codecs where B-frames can reference other B-frames so you need to code information about the frame structure (references and how to update them).

And the problem with MPEG-4 ASP is that while it can have B-frames, its popularity it tied more to AVI container which lacks means to signal frame reordering (fun fact: the MPEG-4 ASP video files in MOV that I have would be perfect candidates for B-frames but lack them entirely). Of course later there other containers gaining popularity like Matroska or OGM (or even MP4 occasionally) but the gilded age seems to be tied to AVI. And of course that created difficulties.

If you have I- and P-frames only, there’s nothing to care about—but multi-threading won’t be that effective either. Newer implementations (Xvid 1.3.7 is rather fresh BTW) output B-frames as is so good luck knowing that in advance and performing reorder. In this case I see if the coded timebase is the same as the one reported by the container and simply re-assign timestamps from the bitstream (and if this does not work—well, tough luck). But there was a funnier intermediate solution with one frame containing data for both P- and B-frame and the following frame being a skip frame, so a decoder could replace it with an appropriate frame. This reminds of Indeo 4 which performed the same trick. And making that a multi-threaded decoding would be a mess requiring either saving frame data and scheduling it for later decoding or scheduling both frames and then trying to tie it to the upcoming frame decoding request. And playing back typical video takes about 20% of CPU load…

Thus I’ve committed what I find to be good enough for my needs and I shall forget about it—at least until some decoding artefact will annoy me enough. There’s more boring and unremarkable stuff I want to do on NihAV, working on this decoder reminded me that it can always be worse (or uglier).

P.S. For some reason repository cloning or updating from git.nihav.org still does not work (but the web interface is fine). I’ve reported the problem and hopefully it will be resolved soon. I suspect that the provider blocked it because of too many synchronisation requests from other sites trying to mirror the repositories. In either case I’m still grateful for the hosting.

Woes of implementing MPEG-4 ASP decoder

Friday, October 11th, 2024

So, for a last month or even more (it feels like an eternity anyway) I was mostly trying to force myself to write a MPEG-4 ASP decoder. Why? Because I still have some content around that I’d like to play with my own player. Of course I have not implemented a lot of the features (nor am I going to do that) but even what I had to deal with made me write this rant.
(more…)

Another bunch of formats I’m not looking at

Saturday, October 5th, 2024

I regularly look at the dexvert list of unsupported video formats to see if something curious comes up. About half of that list are formats supported by na_game_tool, maybe a third are animation systems (i.e. more like a script language telling how to compose and change external or internal resources), but the rest are formats that pique my curiosity. I’ve written about some of them (like Amiga formats I blogged about half a year ago or rather recent TealMovie) and today I’m going to mention some more.

First of all, AVS. That’s the third AVS format I’m hearing about. First there was AVS used in Creature Shock game, then there’s this Chinese MPEG-四 AVS (followed by AVS2 aka HEVS and AVS3 aka “VP9VVC by any other name…”). Apparently there’s another one, from early PC era. It seems to have been used by some ActionMedia cards with Indeo video compression formats (DVI PLV and DVI RTV, also PIC and JPEG are mentioned in the convertor) and audio (8-bit PCM or DVI ADPCM). There’s a special tool for converting AVS to AVI but good luck finding samples (I’ve found one, yulelog.avs, used in a demo). The format seems to be documented (as DVI format) but the codecs are not (beside RTV 2.0 aka Indeo 2). Maybe I’ll take another look at it one day…

Then, there’s a game called Music Chase. I found it by accident looking for Toon Boom Studio samples (which is an animation system, so not so interesting to look at). So what’s interesting about this game?

It looks like the game assets are divided into rooms, each having its own set of resources—usually some TBP files, some TMV files and a MID file or two. The first format is the standard BMP with compression method 21 (which is not). TMV files are ciuQmiTke MOV (i.e QuickTime format but with all values being little-endian now) and custom track handlers so while you can recognize audio track, video track is not so easy. Additionally the helper DLL is 16-bit code that makes Ghidra decompiler give up on almost every function. So maybe I’ll return to it when I’m seriously bored but not today.

Still, it’s nice to encounter such formats time from time.

A cursory glance at TealMovie

Monday, September 23rd, 2024

Apparently there’s such format for Palm so one could play videos on such devices as well (PocketPC devices got so powerful that they could even play 320×240 MPEG-4 ASP videos—provided you used some good opensource player like TCPMP and not something like VLC). Since there’s a Win32 player for it, I decided to look at the format and was slightly disappointed.

I expected it to be more akin to GBA formats that use vector quantisation (I looked at them in this post; e.g. Palm Treo should have comparable or better performance) but instead I got something reminding me of SMUSH of all things.

The format turned out to use palette and transmit frames either in raw form or split into 8×8 blocks and using various opcodes to code them. Most of the opcodes are used to signal that image should be copied from a previous frame with a fixed offset (there are 225 offsets for motion vectors in (-16,-16)..(13,13) range), fill block with single colour (either an arbitrary one or one of four provided in the frame header), skip block, fill with two colours using one of 256 predefined patterns, or split into smaller sub-blocks (and again down to 2×2 sub-sub-blocks) and update all/some of them. And there’s an opcode to code a run of repeated opcodes. If you don’t immediately think about SMUSH codec 47 then I don’t know what it reminds you of.

And of course it supported audio, which followed video part and was either 8-bit PCM or IMA ADPCM.

Overall, I believe it was enough to provide full-screen 160×160 video experience at 15 FPS using the handheld’s over 8000kHZ DragonBall CPU; I still wonder if having a direct 15-bit RGB format would make more sense.

On over- and under-engineered codecs

Tuesday, September 10th, 2024

Since my last post got many comments (more than one counts as many here) about various codecs, I feel I need to clarify my views on what I see as over-engineered codec as well as under-engineered codec.

First of all, let’s define what does not make a codec an over-engineered one. The sheer number of features alone does not qualify: the codec may need those to fulfill its role—e.g. Bink Video had many different coding modes but this was necessary for coding mixed content (e.g. sharp text and smooth 3D models in the same picture); and the features may have been accumulating over time—just look at those H.26x codecs that went a long way, adding features at each revision to improve compression in certain use cases. Similarly it’s not the codec complexity per se either: simple methods can’t always give you the compression you may need. So what is it then?

Engineering in essence is a craft of solving a certain class of problems using practical approaches. Yes, it’s a bit vague but it shows the attitude, like in the famous joke about three professionals in a burning hotel: an engineer sees a fire extinguisher and uses it to put out fire with the minimum effort, a physicist sees a fire extinguisher, isolates a burning patch with it and studies a process of burning for a while, a mathematician sees a fire extinguisher, says that there’s a solution and goes to sleep.

Anyway, I can define an over- or under-engineered codec by its design effectiveness i.e. the amount of features and complexity introduced in relation to the achieved result as well as the target goal. Of course there’s rarely a perfect codec so I’ll use a simpler metric: a codec with several useless features (i.e. those that can be thrown out without hurting compression ratio or speed) will be called over-engineered and a codec which can be drastically improved without changing its overall design will be called under-engineered. For example, an RLE scheme that allows run/copy length of zero can be somewhat improved but it’s fine per se (and the decoder for it may be a bit faster this way); an RLE scheme that uses zero value as an escape with real operation length in the following byte is under-engineered—now you can code longer runs but if you add a constant to it you can code even longer runs and not waste half of the length on coding what can be coded without an escape value already; and an RLE scheme that allows coding the same run or copy in three different ways is over-engineered.

And that’s exactly why XCF is the most over-engineered format I’ve even seen. Among other things it has three ways to encode source offset with two bytes: signed X/Y offsets, signed 16-bit offset from the current position or an unsigned 16-bit offset from the start. And the videos come in either 320×200 or 320×240 size, so unless you have some weird wrap-around video you don’t need all those addressing modes (and actually no video I’ve tried had some of those modes used). Also since the data is not compressed further you can’t claim it improves compression. Speaking of which, I suspect that wasting additional bits on coding all those modes for every block in every frame negates any potential compression gains from specific modes. There are other decision of dubious usefulness there: implicit MV offsets (so short MVs are now in range -4,-4..11,11 for 8×8 blocks and -6,-6..9,9 for 4×4 sub-blocks), randomly chosen data sources for each mode, dedicated mode 37 is absolutely the same as mode 24 (fill plus masked update) and so on.

Of course there are more over-engineered codecs out there, I pointed at Indeo 4 as a good candidate in the comments and probably lots of lossless audio codecs qualify too. But my intent was to show what is really an over-engineered codec and why I consider XCF to be the worst offender among game video codecs.

As for under-engineered codecs, for the reasons stated above it’s not merely a simple codec, it’s a codec where a passerby can point out on a thing that can be improved without changing the rest of the codec. IMO the most fitting example is Sonic—an experimental lossy/lossless audio codec based on Bonk. Back in the day when we at libav discussed removing it, I actually tried evaluating it and ended with encoded files larger than the original. And I have strong suspicion that simply reverting coding method to the original Bonk or selecting some other sane method for residue coding would improve it greatly—there’s a reason why everybody uses Rice codes instead of Elias Gamma’. Another example would be MP3—there’s a rumour that FhG wanted it to be purely DCT-based (as AAC) but for patent holder’s consideration it had to keep QMF, making the resulting codec more complex but less effective.

P.S. The same principles are applicable to virtually everything, from e.g. multimedia containers and to the real world devices like cars or computers, but I’ll leave exploring those to the others.

REing another simple codec

Saturday, June 29th, 2024

Since I was bored I tried to (ab)use discmaster.textfiles.com to search for interesting (i.e. unsupported) samples once again. The main problem is that if it does not decode contents it does not recognize the format. So e.g. AVI files without video track (yes, those files exist) and those using some unrecognized codec will be both marked as aviAudio format, and if audio stream is absent or unknown as well the file gets demoted to unknown.

So I tried to search AVI and MOV files both by extension and by this audio-only type and here are the categories of the results:

  • actual audio-only files (that’s expected);
  • completely different format (there’s an alternative AVI format and MOV is very popular extension as well);
  • improperly extracted files (rather common with MOV on hybrid Macintosh/PC CDs where resource fork often gets ignored);
  • damaged files (happens with some CDs and very common with AOL file library collection—often AVI data starts somewhere in the middle of the file);
  • too old or poorly mastered files (for example, one AVI file lacks padding to 16 bits between chunks; some MOV files can’t be decoded while they look correct);
  • one Escape 130 that could’ve been supported if libavcodec AVI demuxer would not feed garbage to the decoder (it’s not just my demuxer that can handle it, old MPlayer 2 plays it fine with its own demuxer);
  • some TrueMotion 1 files that were not recognised because of tmot FOURCC;
  • files with some special features of the known codecs (I’ve seen some MOV files containing QDraw codec with JPEG frames);
  • files with the codecs I can decode (like IPMA) but the popular software can’t;
  • files with the known codecs (some documented by me) that nobody bothered to implement (especially Motion Pixels 1 and 2);
  • and finally some AVIs with savi FOURCC and a single file with DKRT FOURCC.

Those “SuperAVI” files turned out to be a rebranded Cinepak which I managed to recognise right away, the remaining file turned out to be a bit baffling. After extracting the frames I figured out that it is raw YV12 video, but for some reason it had 64 bytes of soemthing before the image data and 440 bytes after. It can be located on TNG Klingon Language Disc but it does not look like the software there can decode it anyway.

Overall, nothing hard or interesting (if you don’t count the questions about the origins of that file, that is).

A look at an obscure animation system

Tuesday, June 25th, 2024

Since I have nothing better to do, I looked at a thing I’ve encountered. There’s a system developed by some Japanese going by nickname “Y.SAK” that consists of compressed bitmaps (in whatever files) and scripting system using them for displaying animations (that’s .bca files) or even complex scripts (that’s .bac files, don’t confuse them) that may react on mouse, set or test variables and even invoke programs.

Of course the only part I was really interested in were those compressed bitmaps. They have 48-byte header starting with ‘CS‘ and containing author’s copyright, then the header part of DIB file follows (including palette) and finally the compressed data. Apparently there are two supported compression methods—RLE and LZSS. The latter is the familiar code used in many compressors for various things, but RLE is surprisingly interesting. Its opcode contains copy/run flag in the top bit and either 7-bit copy value or 3-bit run length plus 4-bit run value index. Maximum run length/index values mean you need to read the following byte for the real value for each. But that’s not all, note that I wrote “run value index“. There’s a table of possible run values sent before the actual compressed data and that index tells which 4-byte entry from it should be repeated for the run. Nothing revolutionary of course but still a rather curious scheme I don’t remember mentioned anywhere.

And that’s why I keep digging for this old stuff.

Some words on IBM PhotoMotion

Thursday, June 6th, 2024

After a recent rant about search systems I decided to try to find any information about the format (I just happened to recollect that it’s supposed to exist). I don’t know if anybody was lucky but for me the search results were mentions in the list of FOURCCs, some passing references in two papers and that’s all. Now it will probably start returning more results from multimedia.cx domain though 😉

So what should we do when a generic search engines fail? Resort to the specialised ones of course. Thanks to the content search feature of discmaster.textfiles.com I was finally able to locate a CD which uses PhotoMotion technology with both video files and the official player (aptly named P7.EXE, I couldn’t have given it a better name myself). Even better, video files were encoded as both AVI and MM so I could check what output to expect.

Of course Peter’s decoder can’t handle them properly because of the larger header (26 bytes instead of usual 22 or 24 bytes) and uncompressed intra frames. But it was simple to write a simple stand-alone decoder for it to validate that both PhotoMotion and game samples are decoded fine.

This is no major achievement of course but at least it answers a question what that format is all about. So even if there’s still no information about an alleged VfW decoder, now we know what to expect from it.

Looking at random PAF

Thursday, May 16th, 2024

So apparently there’s RIFF-inspired PAF format used in Bi-Fi racing game for DOS (essentially a promotional game for a local brand for sausage-in-a-roll snack).

It’s nothing to write a Multimedia Wiki article about but it’s somewhat interesting. Apparently it has one single CODE chunk that contains commands for updating frame—for all frames. It is pairs of (skip,copy) bytes that tell how many quads of bytes should be skipped on output or updated from the delta frames.

Delta frames are RLE-packed data in DLTA chunks, first 32 bits tell the unpacked data size, then it’s traditional RLE (top bit clear—copy data, top bit set—repeat next byte low seven bits times).

Apparently those files were intended to be used as an animated TV screen overlay on the jukebox background (maybe also for an intro but the CD-rip of the game didn’t have it). So on one hand it’s a mediocre format, on the other it’s somewhat interesting anyway.

Next I should explore the usage of a Duck codec on “iamaduck” console…