REing non-Duck VP X1

June 13th, 2024

While I’m still looking for a solution on encoding video files with large differences with TrueMotion, I distract myself with other things.

Occasionally I look at dexvert unsupported formats to see if there’s any new discovery documented there in video formats. This time it was something called VPX1.

I managed to locate the sample files (multi-megabytes ones starting with “VPX1 video interflow packing exalter video/audio codec written by…” so there’s no doubt about it) and an accompanying program for playing them (fittingly named encode.exe). The executable turned out to be rather unusable since it invokes DPMI to switch to 32-bit mode and I could not make Ghidra decompile parts of the file in 386 assembly instead of 16-bit one (and I did not want to bother to decompile it as a raw binary either). Luckily the format was easy to figure out even without the binary specification.

Essentially the format is plain chunk format complicated by the fact that half of the chunks do not have size field (for palette chunk it’s always 768 bytes, for tile type chunk it’s width*height/128 bytes). The header seems to contain video dimensions (always 320×240?), FPS and audio sampling rate. Then various chunks follow: COLS (palette), SOUN (PCM audio), CODE (tile types) and VIDE (tile colours). Since CODE is always followed by VIDE chunk and there seem to be a correlation between the number of non-zero entries in the former and the size of the latter, I decided that it’s most likely a tile map and colours for it—and it turned out to be so.

Initially I thought it was a simple bit map (600 bytes for 320×240 image can describe a bit map for 4×4 tiles) but there was no correlation between the number of bits set and bytes in tile colours chunk. I looked harder at the tile types and noticed that it forms a sane 20×30 picture so it must be 16×8 tiles. After some more studying the data I noticed that nibbles make more sense, and indeed only nibbles 0, 1, 2 and 4 were encountered in the tile types. So it’s most likely 8×8 tiles. After gathering statistics on nibbles and comparing it to tile colours chunk size I concluded that type 2 corresponds to 32 colours, type 4 corresponds to 1 colour and type 1 corresponds to 16 colours. Then it was easy to presume that type 4 is single-colour tile, type 1 is downscaled tile and type 2 is a tile type with doubling in one dimension. It turned out that type 2 tile repeats each pixel twice and also uses interlacing (probably so video can be decoded downscaled on really slow machines). And that was it.

Overall, it is a simple format but it’s somewhat curious too.

P.S. There’s also DLT format in the same game which has similarly lengthy text header, some table (probably with line offsets for the next image start) and paletted data in copy/skip format (palette is not present in the file). It’s 16-bit number of 32-bit words to skip/zero followed by 16-bit number of 32-bit words to copy followed by the 32-bits to be copied, repeat until the end. Width is presumed to be 640 pixels.

P.P.S. I wonder if it deserves a support via stand-alone library named libvpx1 or libvpx and if this name is acceptable for Linux distributions.

Duck Control 1: update

June 10th, 2024

I’ve been working on TM encoder then and now and finally I have some things to say about it.

First of all, general state of the things: the encoder works and produces valid output for both methods 1 and 3 (the encoding is still not perfect but hopefully it can be fixed), it still lacks audio encoding (I need to add WAV reading support to the encoder and extend my decoder to test the output).

Second, I also decided to add an auto-selection option which allows encoder to decide whether to use method 1 or method 3 for the frame. It simply decides which one to use depending on the percentage of most common pair and the number of unique pairs present in total. It does not seem to have any practical use but it may be handy to test decoders that expect only one coding method to be present in the stream.

And now let’s move to the most interesting thing in all this format (at least to me): codebook generation. TrueMotion (1 and 2X) is a rare example of a codec using Tunstall coding (the only other known codec is CRI P256), essentially an inverse Huffman coding where a fixed-length code corresponds to a sequence of symbols.

The original codebook construction goes something like this: add all symbols to the codebook, while the space allows replace most probable entry with new strings using this old entry as a prefix. E.g. for {0 1 2} alphabet (with 0 being the most probable symbol) and size 8 codebook initially you’ll have just the same {0 1 2}, then {00 01 02 1 2} and finally {000 001 002 01 02 1 2} (and you can add another code there to make it full).

Of course it’s rather impractical in this form as not all sequences will be encountered in the data and you still need to code smaller sequences (e.g. how would you code exactly four zeroes with the above codebook?). Thus I decided to do it a bit differently: I only add new sequences without deleting old ones and I also keep a (limited) statistics on the sequences encountered (from two to twelve symbols) so first I add all encountered pairs of symbols, then select most commonly occurring sequence and add all known children of it (i.e. those with an additional pair of symbols at the end), mark it as ineligible candidate for the following search and repeat the process again until the codebook is full. If somebody cares about implementation details, I used a trie for holding such information as it’s easy to implement and understand; and during update process I keep a list of trie nodes for the previously encountered sequences up to maximum depth so I can update all those sub-sequence statistics in one pass over input.

Does it make a difference? Indeed it does. I took the original LOGO.DUK (the only video with a different codebook), decoded it and re-compressed using the default codebook all other videos are using as well as the using the one generated specifically for it. Here are the results:

  • original .duk size—2818868 bytes;
  • re-compressed file size—2838062 bytes;
  • re-compressed with file-specific codebook—2578010 bytes.

That’s using the same method 3 as the original file. With method 1 file sizes with the standard or custom codebook are 2622758 and 2490058 bytes respectively.

As you can see, the difference is noticeable. Of course it requires two passes over input and many megabytes of memory to store the sequence statistics, but the results may be worth it. In theory the compression may be improved even further if you know how to generate a codebook that allows splitting frame data into unique chunks but that sounds a lot like an NP-hard problem to me.

Anyway, I got what I wanted from it so it just requires some bugfixing, audio encoding support, polishing and documenting. After that I can dump its source code for all zero users and forget about Duck codecs until something even more exotic manages to re-surface.

Some words on IBM PhotoMotion

June 6th, 2024

After a recent rant about search systems I decided to try to find any information about the format (I just happened to recollect that it’s supposed to exist). I don’t know if anybody was lucky but for me the search results were mentions in the list of FOURCCs, some passing references in two papers and that’s all. Now it will probably start returning more results from multimedia.cx domain though 😉

So what should we do when a generic search engines fail? Resort to the specialised ones of course. Thanks to the content search feature of discmaster.textfiles.com I was finally able to locate a CD which uses PhotoMotion technology with both video files and the official player (aptly named P7.EXE, I couldn’t have given it a better name myself). Even better, video files were encoded as both AVI and MM so I could check what output to expect.

Of course Peter’s decoder can’t handle them properly because of the larger header (26 bytes instead of usual 22 or 24 bytes) and uncompressed intra frames. But it was simple to write a simple stand-alone decoder for it to validate that both PhotoMotion and game samples are decoded fine.

This is no major achievement of course but at least it answers a question what that format is all about. So even if there’s still no information about an alleged VfW decoder, now we know what to expect from it.

The freedom* of choice

June 4th, 2024

Since the topic is no longer hot, I can rant on it as well.

Sometimes I get asked why I name the search company with the name starting with G (and part of Alphabet) Baidu consistently throughout my blog. There are several reasons for that, mostly it’s because since they use my work without acknowledging it I don’t see a reason to promote their name either, but more importantly, I feel the company would fit well into a totalitarian regime (on the top of course, they do not want to be mere servants). And recently they’ve proved that once again.

You should be aware of the theory of enshittification by now: at first company caters to the users, then it shifts its focus to the suppliers and finally it starts to serve its own interests. I believe it is just a natural manifestation of shifting power balance but not the intents: companies want to have all money (control, whatever) without doing much work, users prefer to have everything as cheap as possible instead; so in order to get a hold on the market a company needs needs to build a user-base first, then it still has to submit to the suppliers’ wishes (since it still depends on them) until it finally gets an effective monopoly so neither the users nor the suppliers have any other option. Of course in reality there are many factors that still limit companies (sometimes EU regulations can be useful!) so it’s not as bad as it could be otherwise. But who knows, maybe we’ll see the cyberpunk future with large corporations becoming de facto states.

Anyway, back to the Internet search. Previously there was such thing as Internet—a gathering of different web sites and personal pages—and there was a need to find a piece of information of a web site of certain interest. Thus search services came into existence. Some were merely a catalogue of links for certain topics submitted by people, other crawled the Web in order to find new information (IMO AltaVista was the best one).

And then Internet matured and companies discovered that money can be made there. And that’s when we started to get annoying ads—large Flash banners, pop-ups, pop-unders and so on (I vaguely remember time before ads became that annoying but I hardly can believe in that myself). But the process has not stopped there, ad revenue meant that now the sites have a reason to attract users not merely to increase the visitors counter (yes, many sites had such widgets back in the day). That’s how we got another pillar of modern Web—SEO spam. Also with the technological progress we got large sites dedicated to organising user content (previously there were such things as GeoCities or Tripod but they were rather disorganised hosting services for random user homepages), including the worst of them—social networks. Eventually those sites tried to replace the whole Web—and it worked fine for most users who get their daily dose of news, recreation and social interaction from one or two of those sites.

So we have these megasites full with ads and generated nonsense or plagiarised content and Baidu had a reasonable idea of cutting the middle man—if you stay on one site to browse mostly generated nonsense why can’t we provide it all instead of referring you to an ad revenue for a different site? And if you think this idea is bad, there’s not much you can do about it—the very limited competition acts the same. Starting your own search service would require an insane amount of bandwidth and storage to do it right (even the large companies had their search quality declining for years because the content has exponential growth while storage space for even indexing it is limited, so you have to sacrifice something less popular). Mind you, if you limit the scope severely it may work just fine, it’s scaling to all Web content and for general audience that is rather impossible.

Now where does freedom* (yes, with marketing asterisk) of choice come into this picture?

I remember reading back in the day how The Party solved the problem of lacking resources to fulfil needs of people. They declared that the needs of the people are determined by the party (so if you think you should have other food beside bread, mashed eggplants and tinned sprats—well, that’s your own delusion that has nothing to do with your real needs). It feels that Web N.0 companies decided the same—for now mostly in the form of recommendations/suggestions but considering the growing limitations (like avoiding seeing ads on Baidu hosting using Baidu browser—at least they have not introduced mandatory quiz after the ads like reportedly one russian video hosting does) it may soon be about as good as in China (i.e. when you try to deviate from the prescribed path you’ll be gently returned to it and if you persist you’ll be punished—banning your Baidu account seems to be as bad as losing social credit score already). That’s the real freedom of choice—they’re free to choose an option for you and you’re free to choose to accept it (also known as Soviet choice).

Good thing is that most people don’t care and I can manage without. Bad thing is that it spreads elsewhere.

I’m talking mostly about various freedesktop.org projects, especially systemd and GNOME. In both cases the projects offered a certain merit (otherwise they would not stand out of their competition and not get support of IBM) but with the time they became too large in their domain and force their choices on Linux users. For example, systemd may be a conceptually good init system but in reality it can work only with the components designed specifically for it (or do you have a better explanation for existence of things like systemd-timesyncd?). Similarly GNOME is very hostile to attempts to change GUI appearance, so when third-party developers failed to take a hint with plugins and themes breaking every minor release, GNOME developers had to explicitly introduce libadwaitha and forbid any deviations from the light and dark themes hardcoded there. At least finding an alternative there is still possible.

Well, there you have it. I’m not the first to highlight the problems and I’m not proposing a universal solution to them either. But if you ever wondered why I restrict myself on many modern technologies and NIH my own multimedia framework, here’s your answer.

All of legendary animation formats

May 31st, 2024

Since I’m still not in the mood to do something serious, I decided to play some adventure games from Legend Entertainment (you know, parser-based, mostly with EGA graphics). And some of their VGA games like Companions of Xanth or Eric the Unready contain full-screen cutscenes in unknown format. The earlier released Gateway II used the standard FLIC for many of those, SVGA games switched to .Q format—but nothing obvious for these ones. Blackstone Chronicles used QuickTime and thus is not interesting.

So I decided to look what ScummVM source code has to say about it (unrelated fun fact: it’s the only open-source project I donated some money to). Of course there’s a fork with some halfway done support of later Legend Entertainment adventure games, including its picture format support (no such luck for EGA-only games, it seems).

Apparently .PIC files are more a collection of sprites and even full backgrounds. Depending on the game one file may contain all room backgrounds, or just single area backgrounds plus related animations (e.g. flowing water or burning torches), or it may be character sprites, or—as one should expect—a cutscene.

Frames in .PIC may be full frames or delta frames. In the later case they only update a part of the screen (either by replacing an area or XORing it). The more interesting thing is how frame data is compressed. The reference code is reverse-engineered and not so informative, so it took some time to understand it. First apparently there are tables and code used to generate new tables, which turned out to be exactly what I suspected them to be—Huffman codebooks. After a bit more messing with the algorithm, it turned out to be yet another LZ77-based compressor with static codebooks and rather complicated coding that reminds me of deflate… Of course I got suspicious at this stage (it looked a bit more complex than in-house developed compression schemes for game engines usually are) and indeed, it turned out to be Pkware Data Compression Library. And I could’ve found that out simply by looking into strings in one of the overlay files. Oh well…

At least it’s yet another puzzle with formats solved. Also it’s the second time I recently encounter animation format using DCL for compression (previously it was Gold Disk animation). Which makes me wonder what other common LZ77 flavours were used in the animation and video formats. deflate (along with newer contenders like LZO, FLZ and such) is very common in screen-recording codecs. LZS was used in Sierra games .RBT, I vaguely remember RNC ProPack being used by some video format. Nightlong on Amiga used PowerPacker. Did anything use LZX (it came from Amiga before being bought by M$ so maybe it had a chance there)? LZRW? Homebrew schemes are dime a dozen (like all those LZSS variations), I wonder more about the (de facto) standard compression libraries being used in video compression. Anyway, I’ll document them as I encounter them 😉

Duck Control 1

May 18th, 2024

Back in the day there was no Bob but he had toys nevertheless. And of those toys for Bob was Star Control II. It is not a game for me to play (since my reaction time is not good enough for space battles), its 3DO version mentions “TrueMotion “S” Video Compression by The Duck Corporation” in the credits. Now that’s more my thing! I looked at it once, found it to be a lot like TrueMotion 1 but split into different files and looked no further.

But recently I got a curious request: apparently some people want to reuse the player code out of the game for playing videos on the console. And since the console is too old to support even micro-transactions (despite coming from the EA founder) let alone FullHD AV1, they look for something more fitting, which is mostly TrueMotion and Cinepak. As I remember, this format offers a bit more flexibility than the usual TM1 in some aspects (since the tables are transmitted in the video data instead of being stored in the decoder), so writing an encoder for it may be more interesting than it was for plain TM1.

Anyway, here I’d like to talk about this technology and how it differs from the conventional TrueMotion 1.
Read the rest of this entry »

Looking at random PAF

May 16th, 2024

So apparently there’s RIFF-inspired PAF format used in Bi-Fi racing game for DOS (essentially a promotional game for a local brand for sausage-in-a-roll snack).

It’s nothing to write a Multimedia Wiki article about but it’s somewhat interesting. Apparently it has one single CODE chunk that contains commands for updating frame—for all frames. It is pairs of (skip,copy) bytes that tell how many quads of bytes should be skipped on output or updated from the delta frames.

Delta frames are RLE-packed data in DLTA chunks, first 32 bits tell the unpacked data size, then it’s traditional RLE (top bit clear—copy data, top bit set—repeat next byte low seven bits times).

Apparently those files were intended to be used as an animated TV screen overlay on the jukebox background (maybe also for an intro but the CD-rip of the game didn’t have it). So on one hand it’s a mediocre format, on the other it’s somewhat interesting anyway.

Next I should explore the usage of a Duck codec on “iamaduck” console…

Oh No! More Amiga Formats

May 12th, 2024

I keep looking at those, no significant progress yet but here’s what I have got so far:

  • ClariSSA—I’ve managed to locate two routines in ssa.library that look like video decompression routines (one is more advanced version of the other SSA with over a hundred of opcodes, another one is a simple RLE) but I still don’t know how they relate to e.g. BEST or COST chunks inside IFF. Update: apparently it separates data like opcodes or plane values into those chunks, I’ll write in more details about it later;
  • DeluxeVideo—haven’t looked at it that closely but it resembles more of an animation system (e.g. Macromedia Flash) than compressed video frames. Update: that’s because it is, the file essentially contains references to the actual sprites and backgrounds plus the commands and effects to apply on them;
  • TheDirector Film—the player for the only known sample uses some LZ-based cruncher but I’m too lazy to spend time on reconstructing the unpacked executable (or mess with UAE let alone unpacking and powering on my expensive door-stopper with AmigaOS 4.1);
  • Magic Lantern Diff—nothing much to say, looks doable. Update: it turned out to be RLE-based and I’m not sure but it might be coding data vertically like Zoetrope did;
  • NTitler—this is essentially a binary script for the program, referring to external resources for everything and such. Not something particularly interesting or fun to look at.

And there you are. After I do what I can with these formats (though as you can see for two of them it’s “not worth looking further” resolution already) I’ll move to something non-Amiga entirely.

Looking at Adorage SSA

May 8th, 2024

So I’m still looking at the list of unsupported video formats from dexvert in hope something curious comes by.

First, out of curiosity, I looked at the unsupported Delphine Software CIN files. Apparently they merely don’t have audio streams and contain garbage in an audio part of the header (plus they have 1-2 video frames). Ignoring the audio header makes them decode. Next.

Second, EA MAD file—apparently it comes from a game rip with audio and video files being dummied out so it’s easier to share. Nothing to look here in any sense. Next.

So, one of SSA formats (no, not that one SSA animation format for Amiga, another one). It turned out to be rather complicated RLE for bit-plane coding. I.e. frames are coded as several bit-planes with the codec commands essentially being something like “skip to this offset in the current bit-plane, then update following 16-bit values with these new values, then skip a bit more and update some more values then skip…”. There are different opcodes for run/copy of certain sizes (or ranges) plus some additional operations for repeating a pair of values (and I thought LinePack was rather unique at that).

I actually ended up hacking a some decoder. It does not seem to handle 8-bit palettes well and there are still some glitches here (just look at the letter ‘D’ in the following image) but in works in principle.

From technical point of view it was not that hard. Even if Ghidra failed to decompile decoder function properly (mostly because of the opcode jumptable) it was nothing hard, even if I don’t know M68k assembly, expressions like move.l (A0)+,(A1)+ are rather intuitive so I could figure out which piece of code, say, copied 18 bytes and which one replicated the same 16-bit value to those 18 bytes.

I’ll try to find something else to look next, there’s still half a dozen of potentially interesting formats in the list.

ARMovie: trying codec wrappers

May 7th, 2024

I’ve managed to locate two ARMovie samples with video format 600 and 602 (which is M$ Video 1 and Cinepak correspondingly), so I got curious if I can decode them. The problem is that ARMovie stores video data clumped together in a large(r) chunks so you need to parse frame somehow in order to determine its length.

Luckily in this case the wrapped codec data has 16-byte header (first 32-bit word is either frame size or a special code for e.g. palette or a skipped frame, followed by some unknown value and real codec width and height) so packetising data was not that hard. The only problem was that Video 1 stream was 156×128 while the container declared it as 160×128 but after editing the header it worked fine.

Supporting such wrappers actually poses more of a question of design for NihAV—how to link those rather container-specific packetisers to a demuxer. Making demuxer parse all possible formats is rather stupid, my current solution of simply registering global packetiser and hoping there’s no need for another is a bit hacky. I should probably make it namespaced so that the code first probes e.g. “armovie/cinepak” packetiser before “cinepak” one but it’s an additional effort for no apparent gain. Speaking of which, I should probably change code to feed the stream provided by the packetiser to a decoder (instead of always using the one from demuxer) but since I’m lazy I’m not going to do that either.

Anyway, I’m not going to spend more time on ARMovie unless new samples for the formats I don’t support show up (beside newer Eidos Escape codecs which are supported elsewhere already). There are other formats left to look at. For example, I’ve made a good progress with Adorage animation format.