Archive for June, 2024

REing another simple codec

Saturday, June 29th, 2024

Since I was bored I tried to (ab)use to search for interesting (i.e. unsupported) samples once again. The main problem is that if it does not decode contents it does not recognize the format. So e.g. AVI files without video track (yes, those files exist) and those using some unrecognized codec will be both marked as aviAudio format, and if audio stream is absent or unknown as well the file gets demoted to unknown.

So I tried to search AVI and MOV files both by extension and by this audio-only type and here are the categories of the results:

  • actual audio-only files (that’s expected);
  • completely different format (there’s an alternative AVI format and MOV is very popular extension as well);
  • improperly extracted files (rather common with MOV on hybrid Macintosh/PC CDs where resource fork often gets ignored);
  • damaged files (happens with some CDs and very common with AOL file library collection—often AVI data starts somewhere in the middle of the file);
  • too old or poorly mastered files (for example, one AVI file lacks padding to 16 bits between chunks; some MOV files can’t be decoded while they look correct);
  • one Escape 130 that could’ve been supported if libavcodec AVI demuxer would not feed garbage to the decoder (it’s not just my demuxer that can handle it, old MPlayer 2 plays it fine with its own demuxer);
  • some TrueMotion 1 files that were not recognised because of tmot FOURCC;
  • files with some special features of the known codecs (I’ve seen some MOV files containing QDraw codec with JPEG frames);
  • files with the codecs I can decode (like IPMA) but the popular software can’t;
  • files with the known codecs (some documented by me) that nobody bothered to implement (especially Motion Pixels 1 and 2);
  • and finally some AVIs with savi FOURCC and a single file with DKRT FOURCC.

Those “SuperAVI” files turned out to be a rebranded Cinepak which I managed to recognise right away, the remaining file turned out to be a bit baffling. After extracting the frames I figured out that it is raw YV12 video, but for some reason it had 64 bytes of soemthing before the image data and 440 bytes after. It can be located on TNG Klingon Language Disc but it does not look like the software there can decode it anyway.

Overall, nothing hard or interesting (if you don’t count the questions about the origins of that file, that is).

Just a coincidence

Tuesday, June 25th, 2024

A couple of days ago I remember seeing a post that BaidUTube has started sending ads inside the video stream instead of requesting them separately. I immediately thought that re-encoding full videos would be costly and they probably would pull the same trick as Tw!tch (another company which name shan’t be taken in vain) by inserting ad fragments into HLS or DASH playlist among the ones with (questionably) useful content.

Also a couple of days ago yt-dlp stopped downloading videos from BaidUTube in 720p for me, resorting to 360p. I don’t mind much but I got curious why. Apparently BaidUTube stopped providing full encoded videos except in format 18 (that’s H.264 in 360p) even for the old videos. The rest are audio- or video-only HLS or DASH streams.

Probably they’re just optimising the storage by getting rid of those unpopular formats and improving user experience while at it. In other words, see the post title.

P.S. I wonder if they’ll accidentally forget to mark ad segments in the playlist as such but I’ll probably see it when that happens.

P.P.S. I guess I should find another time wasting undemanding hobby. That reminds me I haven’t played OpenTTD for a long time…

A look at an obscure animation system

Tuesday, June 25th, 2024

Since I have nothing better to do, I looked at a thing I’ve encountered. There’s a system developed by some Japanese going by nickname “Y.SAK” that consists of compressed bitmaps (in whatever files) and scripting system using them for displaying animations (that’s .bca files) or even complex scripts (that’s .bac files, don’t confuse them) that may react on mouse, set or test variables and even invoke programs.

Of course the only part I was really interested in were those compressed bitmaps. They have 48-byte header starting with ‘CS‘ and containing author’s copyright, then the header part of DIB file follows (including palette) and finally the compressed data. Apparently there are two supported compression methods—RLE and LZSS. The latter is the familiar code used in many compressors for various things, but RLE is surprisingly interesting. Its opcode contains copy/run flag in the top bit and either 7-bit copy value or 3-bit run length plus 4-bit run value index. Maximum run length/index values mean you need to read the following byte for the real value for each. But that’s not all, note that I wrote “run value index“. There’s a table of possible run values sent before the actual compressed data and that index tells which 4-byte entry from it should be repeated for the run. Nothing revolutionary of course but still a rather curious scheme I don’t remember mentioned anywhere.

And that’s why I keep digging for this old stuff.

REing non-Duck VP X1

Thursday, June 13th, 2024

While I’m still looking for a solution on encoding video files with large differences with TrueMotion, I distract myself with other things.

Occasionally I look at dexvert unsupported formats to see if there’s any new discovery documented there in video formats. This time it was something called VPX1.

I managed to locate the sample files (multi-megabytes ones starting with “VPX1 video interflow packing exalter video/audio codec written by…” so there’s no doubt about it) and an accompanying program for playing them (fittingly named encode.exe). The executable turned out to be rather unusable since it invokes DPMI to switch to 32-bit mode and I could not make Ghidra decompile parts of the file in 386 assembly instead of 16-bit one (and I did not want to bother to decompile it as a raw binary either). Luckily the format was easy to figure out even without the binary specification.

Essentially the format is plain chunk format complicated by the fact that half of the chunks do not have size field (for palette chunk it’s always 768 bytes, for tile type chunk it’s width*height/128 bytes). The header seems to contain video dimensions (always 320×240?), FPS and audio sampling rate. Then various chunks follow: COLS (palette), SOUN (PCM audio), CODE (tile types) and VIDE (tile colours). Since CODE is always followed by VIDE chunk and there seem to be a correlation between the number of non-zero entries in the former and the size of the latter, I decided that it’s most likely a tile map and colours for it—and it turned out to be so.

Initially I thought it was a simple bit map (600 bytes for 320×240 image can describe a bit map for 4×4 tiles) but there was no correlation between the number of bits set and bytes in tile colours chunk. I looked harder at the tile types and noticed that it forms a sane 20×30 picture so it must be 16×8 tiles. After some more studying the data I noticed that nibbles make more sense, and indeed only nibbles 0, 1, 2 and 4 were encountered in the tile types. So it’s most likely 8×8 tiles. After gathering statistics on nibbles and comparing it to tile colours chunk size I concluded that type 2 corresponds to 32 colours, type 4 corresponds to 1 colour and type 1 corresponds to 16 colours. Then it was easy to presume that type 4 is single-colour tile, type 1 is downscaled tile and type 2 is a tile type with doubling in one dimension. It turned out that type 2 tile repeats each pixel twice and also uses interlacing (probably so video can be decoded downscaled on really slow machines). And that was it.

Overall, it is a simple format but it’s somewhat curious too.

P.S. There’s also DLT format in the same game which has similarly lengthy text header, some table (probably with line offsets for the next image start) and paletted data in copy/skip format (palette is not present in the file). It’s 16-bit number of 32-bit words to skip/zero followed by 16-bit number of 32-bit words to copy followed by the 32-bits to be copied, repeat until the end. Width is presumed to be 640 pixels.

P.P.S. I wonder if it deserves a support via stand-alone library named libvpx1 or libvpx and if this name is acceptable for Linux distributions.

Duck Control 1: update

Monday, June 10th, 2024

I’ve been working on TM encoder then and now and finally I have some things to say about it.

First of all, general state of the things: the encoder works and produces valid output for both methods 1 and 3 (the encoding is still not perfect but hopefully it can be fixed), it still lacks audio encoding (I need to add WAV reading support to the encoder and extend my decoder to test the output).

Second, I also decided to add an auto-selection option which allows encoder to decide whether to use method 1 or method 3 for the frame. It simply decides which one to use depending on the percentage of most common pair and the number of unique pairs present in total. It does not seem to have any practical use but it may be handy to test decoders that expect only one coding method to be present in the stream.

And now let’s move to the most interesting thing in all this format (at least to me): codebook generation. TrueMotion (1 and 2X) is a rare example of a codec using Tunstall coding (the only other known codec is CRI P256), essentially an inverse Huffman coding where a fixed-length code corresponds to a sequence of symbols.

The original codebook construction goes something like this: add all symbols to the codebook, while the space allows replace most probable entry with new strings using this old entry as a prefix. E.g. for {0 1 2} alphabet (with 0 being the most probable symbol) and size 8 codebook initially you’ll have just the same {0 1 2}, then {00 01 02 1 2} and finally {000 001 002 01 02 1 2} (and you can add another code there to make it full).

Of course it’s rather impractical in this form as not all sequences will be encountered in the data and you still need to code smaller sequences (e.g. how would you code exactly four zeroes with the above codebook?). Thus I decided to do it a bit differently: I only add new sequences without deleting old ones and I also keep a (limited) statistics on the sequences encountered (from two to twelve symbols) so first I add all encountered pairs of symbols, then select most commonly occurring sequence and add all known children of it (i.e. those with an additional pair of symbols at the end), mark it as ineligible candidate for the following search and repeat the process again until the codebook is full. If somebody cares about implementation details, I used a trie for holding such information as it’s easy to implement and understand; and during update process I keep a list of trie nodes for the previously encountered sequences up to maximum depth so I can update all those sub-sequence statistics in one pass over input.

Does it make a difference? Indeed it does. I took the original LOGO.DUK (the only video with a different codebook), decoded it and re-compressed using the default codebook all other videos are using as well as the using the one generated specifically for it. Here are the results:

  • original .duk size—2818868 bytes;
  • re-compressed file size—2838062 bytes;
  • re-compressed with file-specific codebook—2578010 bytes.

That’s using the same method 3 as the original file. With method 1 file sizes with the standard or custom codebook are 2622758 and 2490058 bytes respectively.

As you can see, the difference is noticeable. Of course it requires two passes over input and many megabytes of memory to store the sequence statistics, but the results may be worth it. In theory the compression may be improved even further if you know how to generate a codebook that allows splitting frame data into unique chunks but that sounds a lot like an NP-hard problem to me.

Anyway, I got what I wanted from it so it just requires some bugfixing, audio encoding support, polishing and documenting. After that I can dump its source code for all zero users and forget about Duck codecs until something even more exotic manages to re-surface.

Some words on IBM PhotoMotion

Thursday, June 6th, 2024

After a recent rant about search systems I decided to try to find any information about the format (I just happened to recollect that it’s supposed to exist). I don’t know if anybody was lucky but for me the search results were mentions in the list of FOURCCs, some passing references in two papers and that’s all. Now it will probably start returning more results from domain though 😉

So what should we do when a generic search engines fail? Resort to the specialised ones of course. Thanks to the content search feature of I was finally able to locate a CD which uses PhotoMotion technology with both video files and the official player (aptly named P7.EXE, I couldn’t have given it a better name myself). Even better, video files were encoded as both AVI and MM so I could check what output to expect.

Of course Peter’s decoder can’t handle them properly because of the larger header (26 bytes instead of usual 22 or 24 bytes) and uncompressed intra frames. But it was simple to write a simple stand-alone decoder for it to validate that both PhotoMotion and game samples are decoded fine.

This is no major achievement of course but at least it answers a question what that format is all about. So even if there’s still no information about an alleged VfW decoder, now we know what to expect from it.

The freedom* of choice

Tuesday, June 4th, 2024

Since the topic is no longer hot, I can rant on it as well.

Sometimes I get asked why I name the search company with the name starting with G (and part of Alphabet) Baidu consistently throughout my blog. There are several reasons for that, mostly it’s because since they use my work without acknowledging it I don’t see a reason to promote their name either, but more importantly, I feel the company would fit well into a totalitarian regime (on the top of course, they do not want to be mere servants). And recently they’ve proved that once again.

You should be aware of the theory of enshittification by now: at first company caters to the users, then it shifts its focus to the suppliers and finally it starts to serve its own interests. I believe it is just a natural manifestation of shifting power balance but not the intents: companies want to have all money (control, whatever) without doing much work, users prefer to have everything as cheap as possible instead; so in order to get a hold on the market a company needs needs to build a user-base first, then it still has to submit to the suppliers’ wishes (since it still depends on them) until it finally gets an effective monopoly so neither the users nor the suppliers have any other option. Of course in reality there are many factors that still limit companies (sometimes EU regulations can be useful!) so it’s not as bad as it could be otherwise. But who knows, maybe we’ll see the cyberpunk future with large corporations becoming de facto states.

Anyway, back to the Internet search. Previously there was such thing as Internet—a gathering of different web sites and personal pages—and there was a need to find a piece of information of a web site of certain interest. Thus search services came into existence. Some were merely a catalogue of links for certain topics submitted by people, other crawled the Web in order to find new information (IMO AltaVista was the best one).

And then Internet matured and companies discovered that money can be made there. And that’s when we started to get annoying ads—large Flash banners, pop-ups, pop-unders and so on (I vaguely remember time before ads became that annoying but I hardly can believe in that myself). But the process has not stopped there, ad revenue meant that now the sites have a reason to attract users not merely to increase the visitors counter (yes, many sites had such widgets back in the day). That’s how we got another pillar of modern Web—SEO spam. Also with the technological progress we got large sites dedicated to organising user content (previously there were such things as GeoCities or Tripod but they were rather disorganised hosting services for random user homepages), including the worst of them—social networks. Eventually those sites tried to replace the whole Web—and it worked fine for most users who get their daily dose of news, recreation and social interaction from one or two of those sites.

So we have these megasites full with ads and generated nonsense or plagiarised content and Baidu had a reasonable idea of cutting the middle man—if you stay on one site to browse mostly generated nonsense why can’t we provide it all instead of referring you to an ad revenue for a different site? And if you think this idea is bad, there’s not much you can do about it—the very limited competition acts the same. Starting your own search service would require an insane amount of bandwidth and storage to do it right (even the large companies had their search quality declining for years because the content has exponential growth while storage space for even indexing it is limited, so you have to sacrifice something less popular). Mind you, if you limit the scope severely it may work just fine, it’s scaling to all Web content and for general audience that is rather impossible.

Now where does freedom* (yes, with marketing asterisk) of choice come into this picture?

I remember reading back in the day how The Party solved the problem of lacking resources to fulfil needs of people. They declared that the needs of the people are determined by the party (so if you think you should have other food beside bread, mashed eggplants and tinned sprats—well, that’s your own delusion that has nothing to do with your real needs). It feels that Web N.0 companies decided the same—for now mostly in the form of recommendations/suggestions but considering the growing limitations (like avoiding seeing ads on Baidu hosting using Baidu browser—at least they have not introduced mandatory quiz after the ads like reportedly one russian video hosting does) it may soon be about as good as in China (i.e. when you try to deviate from the prescribed path you’ll be gently returned to it and if you persist you’ll be punished—banning your Baidu account seems to be as bad as losing social credit score already). That’s the real freedom of choice—they’re free to choose an option for you and you’re free to choose to accept it (also known as Soviet choice).

Good thing is that most people don’t care and I can manage without. Bad thing is that it spreads elsewhere.

I’m talking mostly about various projects, especially systemd and GNOME. In both cases the projects offered a certain merit (otherwise they would not stand out of their competition and not get support of IBM) but with the time they became too large in their domain and force their choices on Linux users. For example, systemd may be a conceptually good init system but in reality it can work only with the components designed specifically for it (or do you have a better explanation for existence of things like systemd-timesyncd?). Similarly GNOME is very hostile to attempts to change GUI appearance, so when third-party developers failed to take a hint with plugins and themes breaking every minor release, GNOME developers had to explicitly introduce libadwaitha and forbid any deviations from the light and dark themes hardcoded there. At least finding an alternative there is still possible.

Well, there you have it. I’m not the first to highlight the problems and I’m not proposing a universal solution to them either. But if you ever wondered why I restrict myself on many modern technologies and NIH my own multimedia framework, here’s your answer.