A Quick Review of Actimagine Video Codecs

August 23rd, 2020

Now that (as I believe) I’ve fixed remaining reconstruction bugs in VX decoder, why not do a quick comparison of various video codecs developed by Actimagine and see how they differ (if at all).

There seem to be the following codecs:

  • Actimagine (VX)
  • Mobiclip (Mods)
  • Mobiclip (Moflex for 3DS also there’s a version of it for PC known as Mobiclip HD)

And while they all are based on H.264 with finer block partitioning, there are some differences as well.

Proper structure. The original VX codec used quantiser derived from FPS and all frames were encoded in the same way, while the latter codecs have I-frames and quantisers are transmitted for each frame (as delta for non-keyframes).

References and motion compensation. VX had three previous frames as reference ones, later codecs increased that number to five. VX had fullpel motion compensation, later codecs use halfpel MC.

Data coding. VX relied on Elias Gamma’ codes for all codes except coefficient coding, later codecs use codebooks for most coded values. Also while VX coded residue in 4×4 blocks in H.264 way (starting from the end and with tail of ones coded explicitly), newer codecs use separable transforms and the usual (zero run, coefficient level) coding. Additionally only nine coding modes out of twenty four have survived after VX (intra prediction, MC with motion vectors coded and splits).

Overall, while all those codecs are related, there are large differences between VX and later Mobiclip variants and the only differenced between Mobiclip variants are colourspace (Mods uses YCoCg model, HD uses the proper YUV model), quantiser being clipped to 12-52 range, and block mode codebooks being different.

As I mentioned before, somebody has reverse-engineered decoders for Mobiclip (and a quick check on codebooks used tells me that Mobiclip HD and 3DS versions are the same) so if somebody needs them it should not be that hard to write a decoder.

A look at some old game

August 19th, 2020

Sometimes I like to play old strategy games from my youth: Civilization II, Settlers II, WarCraft II and Reunion. You probably have never heard about it since it’s not from some famous studio but from some Hungarians and published by rather obscure publisher too.

The idea is about the same as in Settlers II but IN SPACE! In some near future an experimental spaceship somehow gets into an unknown star system, most of technologies are lost and now you have to colonise planets, fight with aliens and find your way back home. This game combines some planet-building with space exploration and ground battles (there are also battles in space but they’re fought without your involvement). And since it has a story you have events like getting a chance to get some technology or break the alliance between your enemies. So it’s an interesting mix overall and it explains why I still return to it time from time. Sadly the game was programmed in traditional Hungarian manner (remember, Hungarians are responsible for such popular software as Windows 95 or MPlayer) and its intro (a separate program) sometimes crashes and sometimes it even makes DosBox segfault. The main game is also prone to corruptions and crashes (yet I still play it sometimes).

Anyway, today I’ve stumbled upon a page of one guy who reverse-engineered image format used in this game just by fiddling with it. It turned out to be compressed with RLE similar to the one used in PCX (0x00-0xBF – normal pixel, 0xC0-0xFF – run of next byte value 0-63 times). Since the game had some animations as well I decided to look at them.

So intro uses mostly still images split into 640×100 strips (so they can fit into one segment if you remember those) that are scrolled and faded in and out. And there’s a special animation format for some in-game animations similar to the picture format (as expected). Animation file is a series of frames (without palette) that are coded with similar RLE but there are some quirks not encountered in still images. First of all, frames are coded as differences and codes in range 0x80-0xBF are used to signal how many pixels to skip. Second, it turns out that codes 0x80 and 0xC0 are actually escape codes and are followed by 16-bit value of actual skip or run length (and in case of 0xC0 code a pixel value after that). Again, since the format is so simple it could be found just by looking inside the animation files and messing with a decoder.

As for the other games mentioned in the beginning, Civ2 has GIF files mostly hidden inside resource .dlls plus Indeo 4 video (with transparency even!) and Settlers II and WarCraft II have videos in Smacker format.

Having said that, my pointless diversion to looking at game formats is over, back to doing nothing!

NihAV relicensed code registry

August 17th, 2020

Since I’ve got the second request for a decoder relicensing I’ve decided to keep an open list of the project that requested relicensing. This way it may satisfy somebody’s curiosity about which parts of NihAV piqued some interest and also keep a proof for a project that I granted them a new license for the code.

The page is right here.

Actimagine VX: another imperfect decoder

August 13th, 2020

So I’ve released my decoder for Actimagine VX and it’s far from perfect.

First problem is audio. While the codec itself it not that tricky (it turned out to be some LPC codec that takes 5-10 16-bit words per frame to code pulses and filter for 128-sample frame), but its data is stored right after video frame data so in order to decode audio first you need to decode video frame and feed the remains of input buffer to the audio decoder. Since I can’t do that in a sane way I could not test the decoder either and it’s there just for the informative purposes only.

The second problem is obviously video. I’ve managed to decode bitstream fine but reconstructed images are not bit-exact and in case of plane prediction this leads to ugly artefacts (essentially the target value wraps around and you have gradients from white to black or vice versa instead of almost flat dark or white regions). I’ve introduced a clipping which seems to help but this is not right and maybe I’ll fix it one day. Maybe even before Bink2.

And finally there are some problems with the demuxer. In theory VX files may have multiple tracks but my demuxer might not handle them at all and if it does then it’ll simply ignore anything but the first video stream.

So VX support is far from perfect but it serves its goal of proving that the format works as expected. And if it’s useful to anybody then it’s even better.

Some words about Bink2

August 9th, 2020

As you may know (but definitely not care), NihAV has some limited support for Bink2 video. The problem in fixing it is that known samples are usually 720p video or mode which makes it hard to debug decoding past few initial frames (okay, older versions have smaller known videos so they’re likely to be fixed sooner). And of course the encoder is available only to the RAD customers to which I don’t belong. So in result I’ve decided to look at Actimagine VX codec once again.

I’ve looked at it four years ago but I could just study it but not write a decoder because of the binary. Essentially this codec happens on BigN DS consoles so you have to deal with raw ARM7 or ARM9 binary that (as it turns out) sets up its own segments (and the problems arise when you see absolute addresses to the areas not present there). So you load binary at addresses e.g. 0x2000000-0x20e1030 but in reality it contains also segments 0x1ffe800-0x1fff000 and 0x27e0000-0x27e4000. Thankfully Ghidra can not just load raw ARM binary but also add aliases to data as new segments. This allowed me to work on the decoder again and now I have more or less complete understanding of it and semi-working decoder for it as well, here’s an example:

Sample decoded frame.

Essentially it’s a simplified variant of H.264 with the following features: frames are split into 16×16 macroblocks that can be further recursively divided horizontally or vertically down to 2×2 blocks. Block can be coded in 24 different modes that boil down to full-pel motion compensation from one of three previous frames (without a motion vector, with motion vector, or with motion vector and an offset value that should be added to each pixel), intra prediction on whole block or intra prediction in 4×4 blocks. Also whether you have residue coded is also part of the mode (e.g. mode 11 is intra prediction without residue and mode 22 is intra prediction with residue). Residue is coded in 8×8 blocks comprising six 4×4 coefficient blocks, each block is coded in a way reminding of H.264: there are numbers for total number of non-zero coefficients, number of last non-zero coefficients being plus-minus one and number of zeroes dispersed between non-zero coefficients. Those being coded with variable-length codes that I could not access earlier was the blocker but not any more.

And there’s one curious feature of this codec that made it worth REing: instead of using plane prediction like H.264, this codec fills block in a recursive way. It interpolates bottom-right corner as an average of top-right and borrom-left neighbour pixels (e.g. [15,-1] and [-1,15] for 16×16 block; it also adds a delta to it in certain decoding modes), then it calculates halfway-bottom right and halfway-right bottom pixels (e.g. [15,7] and [7,15] for 16×16 block), then a centre pixel, and then repeats the process for each quarter (or half for some rectangular blocks). This is less computationally intensive than ordinary plane prediction and it seems to give nice results too.

I mentioned before that my decoder is far from perfect (and you can see it for yourself on that picture) but I know how to debug and improve it. I’m not trying to say that piracy is okay, but being able to find some .nds image with a game that has VX videos and using it with DeSmuME with GDB stub would help to debug the decoder but piracy is bad and so it’s not a proper way to do things.

As for audio counterpart, I should mention this: curiously enough there’s an opensource decoder for later MobiClip formats that seems to contain working Sx decoder for an audio used in VX files (it’s a pity the person who did it could not finish VX as well—why should I do the work myself instead of letting other people do my work for me?!). Unfortunately it’s mostly translated assembly so while it should work it’s mostly sub_XXX() doing various accesses to various positions of large byte array of decoder state. I’ll probably add it as well for completeness sake and document the formats properly after I fix the decoder (which should happen during this year too).

A Quality Video Hosting

July 31st, 2020

A brief context: I watch videos from BaidUTube (name slightly altered just because) and my preferable way to do that is to grab video files with youtube-dl in 720p quality so I can watch them later at my leisure, in the way I like (i.e. without a browser), and re-watch later even if it’s taken down. It works fine but in recent weeks I’ve noticed that some of the downloaded videos are unplayable. Of course this can be fixed by downloading it again in slightly different form (separate video stream and separate audio streams muxed locally, youtube-dl can do that) but today I was annoyed enough to look at the problem.

In case it’s not obvious I’m talking about mp4 filed encoded and muxed at BaidUTube without any modifications by youtube-dl which merely downloaded it. So, what’s the problem?

Essentially MP4 file contains header with metadata telling at which offset and which size are frames for each codec and the actual data is stored in mdat atom. Not here. First you have lots of 12-byte sequenced 90 00 00 00 00 0X XX XX 00 02 XX XX, then moof atom (used in fragmented MP4) and then another mdat. And another. I’ve tried to avoid streaming stuff but even to me it looks like somebody put all fragments prepared for HLS streaming into single MP4 file making an unplayable mess.

Overall this happens only on few random videos and probably most of the browsers would not pick it (since VP9 or VP10 in WebMKV is the suggested format) so I don’t expect it to be fixed. My theory is that they decided to roll a new version of encoding software with a broken muxer library or muxing mode. And if you ask “What were they thinking? You should run at least some tests to see if it encodes properly.”, one wise guy has an answer to you: they weren’t thinking about that, they were thinking when how long until the lunch break and then when it’s time to go home. This is the state of enterprise software and I have no reasons to believe the situation will ever improve.

And there’s a fact maybe related to it. Random files starting from 2019 maybe also show the marker “x264 – core 155 r2901 7d0ff22” in the encoded frames while most of the files have no markers at all. While I don’t think they violate the license it still looks strange that a company known for not admitting that it uses open-source projects (“for their own protection” as it was explained once) lets such marker slip through.

Well, that was an even more useless rant than usual.

A Quick Look on LCEVC

July 29th, 2020

As you might’ve heard, MPEG is essentially no more. And the last noticeable thing related to video coding it did the last was MPEG-5 (and synthesising actors and issuing commands to them with MPEG-G and MPEG-4 standards unholy unity). In result we have an abuse of letter ‘e’—in HEVC, EVC and LCEVC it means three different things. I’ll talk about VVC probably when AV2 specification is available, EVC is slightly enhanced AVC and LCEVC is interesting. And since I was able to locate DIS for it why not give a review of it?

LCEVC is based on Perseus and as such it’s still an interesting concept. For starters, it is not an independent codec but an enhancement layer to add scalability to other video codecs, somewhat like video SBR but hopefully it will remain more independent.

A good deal of specification is copied from H.264 probably because nobody in the industry can take a codec without NALs, SEIs and HRD seriously (I know their importance but here it still feels excessive). Regardless, here is what I understood from the description while suffering from thermal throttling.

The underlying idea is quite simple and hasn’t changed since Perseus: you take a base frame, upscale it, add the high-frequency differences and display the result. The differences are first grouped into 4×4 or 8×8 blocks, transformed with Walsh-Hadamard matrix or modified Walsh-Hadamard matrix (with some coefficients being zeroed out), quantised and coded. Coding is done in two phases: first there is a compaction state where coefficients are transformed into byte stream with flags for zero runs and large values (or RLE just for zeroes and ones) and then it can be packed further with Huffman codes. I guess that there are essentially two modes: a faster one where coefficient data is stored as bytes (with or without RLE) and slightly better compressed mode with those values are further packed with Huffman codes generated per tile.

Overall this looks like a neat scheme and I hope it will have at least some success. No, not to prove Chiariglione’s new approach for introducing new codecs an industry can use without complex patent licensing, but rather because it might be the only recent major video codec built on principles different from H.26x line and its success may introduce more radically different codecs and my codec world will get less boring.

NihAV: released!

July 27th, 2020

NihAV was a fine joke that had been running for far too long. But today, on no particulate date at all, I release it for public to ignore or to briefly look and forget immediately. Some decoders (Bink2, ClearVideo and Vivo 2) are still far from perfect, some features have simple or sketchy implementations, but despite all of that here it is.

The official website is here, source code is here.

Many thanks to people from former Libav project for hosting.

Some words about NihAV tools

July 11th, 2020

Since the work on NihAV is nearing the point when I can release it to public without that much shame (main features I wanted to implement are there and I’ve even documentation for all public interfaces plus some overview, you can’t ask for more than that) I want to give $title.

nihav-tool

This is the oldest tool oriented mostly to test decoders functionality. By default it will try to decode every stream in a file and output it either into a wave file or a sequence of images (PPM for RGB video, PGMYUV for YUV). Beside that it can also not decode a stream (and if you choose to decode neither then it tests demuxer or dumps raw frames).

Here is the list of switches it understands:

  • -noout makes it decode data but not produce any output (good for testing decoding process if you don’t currently care about decoder output);
  • -an/-vn makes it ignore audio or video streams correspondingly;
  • -nm=count/pktpts/frmpts make nihav-tool write frame numbers as a sequence or using PTS from input packet or decoded frame correspondingly;
  • -skip=key/inter tells video codec (if it is willing to listen) to skip less significant frames and decode only keyframes or intra- and interframes but no B-frames;
  • -seek time tells the tool to seek to the given position before decoding;
  • -apfx/-vpfx prefix specify the prefix for output filename(s) which comes useful when decoding files in a batch;
  • -ignerr tells nihav-tool to keep decoding ignoring errors the decoders report;
  • -dumpfrm tells nihav-tool to dump raw frames. This is both useful for obtaining raw audio frames (I could not make avconv do that) and because of the way it is implemented (it dumps packet contents first and then tries to decode it) if you use it along with the decoder and it errors out you’ll have raw frame on which it errored out.

Additionally you can specify end time after giving input name if you don’t need to decode the whole file.

As you can see this is not the most feature-rich tool but it works good enough for the declared goal (hence I use it mostly a debug build of it).

nihav-player

This is another quick and dirty tool that appeared when I decided that looking at long sequences of images is not the best way to ensure that decoding goes right. So I wrote something that can pass in a bad light for a player since it can show moving pictures and play sound that sometimes even goes in sync instead of deadlocking audio playback thread.

Currently it’s written using patched SDL1 crate (removing dependencies on num and rand and adding YUV overlay support and audio interface that you can actually use from Rust; patches will be available in the same repository) because my primary development system is too old and I don’t want to mess with various libraries or finding which version of sdl2 crate would compile using my current version of Rust (1.31 or 1.33.

In either case it’s a temporary solution used mostly for visual debugging and I want to write a proper media player based on SDL2 that would play audio-only files just as fine (so I can move to dogfooding). After all, can you really call yourself a multimedia developer if haven’t written a single player?

nihav-encoder

And finally the tool that appeared out of need to debug encoders instead of decoders. Hopefully it will become more useful than that one day but at least its interface should give you the idea what it does and what it will do in the future.

I still consider one of the main problems with ffmpeg (the tool) and later avconv the positional order of arguments. Except when the order does not matter. If you’ve never been annoyed by the fact you should put some arguments before -i infile in order for them to take effect on input and the rest of arguments should be put before output file name—well, in this case you’re luckier than me. So I’ve decided to have it in a more free-form format.

nihav-encoder command line looks a list of options in no particular order and some of them take complex arguments and then you provide a comma-separated list in form --options-list option1,option2=value,option3=.... Here is the list of recognised options:

  • --list-{decoders,encoders,demuxers,muxers} obviously lists the corresponding category and quits after listing all requested lists and options (see the next item);
  • --query-{decoder,encoder,demuxer,muxer}-options name prints the list of options supported by the corresponding codec or (de)muxer. Of course you can request options for several different things to be listed by adding this option several times;
  • --input inputfile and --output outputfile;
  • --input-format format and --output-format format force (de)muxer to use the provided format when autodetection fails;
  • --demuxer-options options takes a comma-separated list of options for demuxer (BTW you can also force input format with e.g. --demuxer-options format=avi);
  • --muxer-options options takes a comma-separated list of options for muxer (BTW you can also force output format with e.g. --muxer-options format=avi);
  • --no-audio and --no-video tell nihav-encoder to ignore all audio or video streams correspondingly;
  • --start time and --end time tell nihav-encoder to start decoding at given time and end at given time. The times are absolute so --start 1:10:00 --end 1:11:00 will process just a second of data;
  • --istreamX options and --ostreamX options set options for input and output streams with given numbers (starting with zero of course). More about them below.

nihav-encoder has two modes of operation: query mode, in which you specify which e.g. demuxers or codec options you want listed, and the program quits after listing them; and transcode mode, in which you specify input and output file and what you want to do with them. Maybe I’ll add a probe mode but I’ve never cared much about it before.

So what happens when you specify input and output? nihav-encoder will try to see which streams can be output (e.g. when transcoding from AVI to WAV there’s no point to even attempt to do anything with video stream), then it will try to copy input streams to the output unless anything else is specified. Of course you can specify that you want to discard some input stream with e.g. --istream0 drop. And for output streams you can also specify encoder and its parameters. For example my command line for testing Cinepak encoding looks like this:

./nihav-encoder –input laser05.avi –output cinepak.avi –no-audio –ostream0 encoder=cinepak,quant_mode=mediancut,nstrips=4

It takes input file laser05.avi, discards audio stream, encodes remaining video stream with Cinepak encoder that has options quant_mode and nstrips set explicitly, and writes the result to cinepak.avi.

As you can see, this tool has enough features to serve as a daily base transcoder but no complex features like taking input from several files, arbitrary mapping input streams from them to output streams and maybe applying some effects while at it. In my opinion that’s the task for some more complex application that builds a complex processing graph probably using a domain-specific language to specify inputs and outputs and what to do with them (and it should be a proper command file instead of command line that is impossible to type correctly even from the eighth try). Since I never had interest in GStreamer I’m definitely not going even to play with that. But a simple transcoder should serve my needs just fine.

Another reason for NihAV

July 4th, 2020

So instead of doing something productive like adding missing functionality bits and writing documentation I wasted my time on adding some QuickTime decoders. And while wasting time on adding SVQ1, SVQ3, QDMC and QDM2 decoders it became apparent why NihAV is a good thing to exist.

Implementing two of them was not a very big deal but implementing SVQ3 and QDM2 decoders took more than a week each because there are only two specifications available for them and both are equally hard to comprehend: the first one is the official binary specification, the second one is source code in libavcodec which is derived from the former.

The problem arises when somebody wants to understand how it works and/or reimplement the code and both SVQ3 and QDM2 decoder demonstrate two different aspects of that problem.

SVQ3 decoder is based on some draft of H.264 (or ex-MPEG/AVC if you’re from Piedmont) with certain extensions mostly related to motion compensation. Documentation for it was scarce and because of optimisations and integration with common H.264 decoder bits it’s hard to understand some of the things. One of those is intra prediction with two modes having SVQ3-specific hacks hidden in libavcodec/h264pred.c (those are 16×16 plane prediction mode giving transposed result and 4×4 diagonal down prediction being simplified and not relating on pixels not immediately top/left from the block) and another one is block coefficients decoding function. It took me quite a while to realize that it actually decodes three different kinds of blocks: single 4×4 block with zigzag scan, 4×4 block divided into two parts with interlaced scan, and 2×2 block. I’ve documented most of that in The Wiki (before that nobody has touched that page for almost ten years; sometimes I feel like I’m the only person contributing there).

QDM2 is horrible in different way. It is slightly improved translation of the original binary specification with hardly any idea how it works (there are still names like local_int_8 in the code). Don’t get me wrong, back in 2003-2005 when reverse engineering was done the only tools you had were debugger, disassembler (you’re lucky if it’s not the one provided by debugger) and no decompilers at all (IIRC rec appeared much later and was of limited usefulness, especially on multi-megabyte QT monolith—and that’s assuming you’re not doing that on Mac with even less tools available). I did some of such work back then as well so I understand how hard it is and how you’re happy that it works somehow and you can ship it and forget about it.

Another thing is that now it’s clear that QDMC and QDM2 are predecessors of DT$ LBR (aka Express) and use the same principles (QDMC simply coded noise and tones, QDM2 is almost like LBR but without some features like LPC or multichannel audio and with different chunk structure), but back in the day there was no documentation on LBR (or LBR itself for that matter).

But the main problem is that nobody has tried to understand the code since. It became a so-called category killer i.e. its existence prevents others from doing something similar. At least until some idiot tried to do another implementation in NihAV.

And here we have the reason for NihAV to exist: it advances the understanding of codecs for me (and I document results in The Wiki) resulting in different implementations that are (hopefully) easier to understand and sometimes even fix long-standing bugs. I hope this shall convince you that sometimes it’s good to have reimplementation of the decoder even if an existing implementation is good enough (as far as I remember the only time a decoder was rewritten in FFmpeg was when a reverse-engineered Indeo 3 decoder that crashed on damaged content almost every time was replaced with a reverse-engineered Indeo 3 decoder where a guy had the idea how it works).

But back to QDM2: while my decoder is not finished yet and I probably won’t bother with inter-frames in it (I’ve never seen any samples with those), it still decodes sweeps much better. That’s mostly because of the various bugs I’ve uncovered (also while discovering that Ghidra effectively does not allow to edit about a megabyte large decoder context). Since I have no incentive to produce a patch and people who created the decoder are long gone from the project, here are some spotted bugs: wrong coarse quantiser band selection (resulting in noise generated in wrong frequency range), reading bits past the chunk end (because is some cases checks are missing), ignoring group 4 tones because of the wrong conditions, some initial variables are set in the wrong way too. Nevertheless it mostly works and it was very useful for mapping the functions in the binary specification (fun fact: QDM2 decoder is located in QuickTime.qts while QDMC is located in QuickTimeInternetExtras.qtx).