Archive for the ‘Various Video Codecs’ Category

A quick look on movies for handhelds

Sunday, March 21st, 2021

In not-exactly-recent news there was a piece about some guy who decided not to listen to the advice of a director of some blockbuster and instead of going to cinema to watch it he encoded it to watch on Game Boy cartridges instead. While people doing stupid things is hardly news, it sparked a mild interest in me so I looked what are the options on underpowered hardware for storing video.

It turned out there are at least three formats for coding not just cutscenes but whole movies (or at least episodes of various series) to fit into 32MB GBA cartridge. And those three formats seem to be built on vector quantisation and they all embed video into the player program (well, the cartridge in this case does not have segments or filesystem for different resources).

  • GBA Video is probably the most famous and the most official one (there were official releases of couple dozens animated movies and cartoon series that used the format). It’s been developed by Majesco and it seems to use vector quantisation and deflate and since it checks codebook size to be 256*6, it’s most likely to be something like Cinepak using 2×2 YUV 420 codebook entries for compression. Additionally it seems to use left prediction (i.e. code pixel as a difference to the left one);
  • Caiman video codec seemed to come in two flavours, the original one coding 8×8 blocks using either four 4×4 pixel codebooks or just one scaled (that reminds me of Cinepak again for some reason, maybe because it did the same albeit using 2×2 vectors), next version of the codec introduced codebooks of different sizes and 8×8 block could be split recursively for that (also that version got motion compensation);
  • METEO is some Japanese format that seems to be the choice for the GBA enthusiasts since there’s a free encoder for it. I actually looked into it to see what it does (it’s a standalone binary about two hundred kilobytes large) and it turns out to decode input videos using standard Windows interfaces and encode frames with Cinepak encoder and write them into their own container.

All these formats make me think that if I look at other gaming consoles I can find Cinepak there as well. Let’s look what those FMV games used

Curiosity satisfied, I should move to something else.

Fixing SVQ1 decoding bug

Saturday, March 6th, 2021

In the comments to the previous post a certain Paul B. pointed out that SVQ1 decoder (the one in libavcodec or mine) decodes certain files with visual artefacts. So I opened the old dreary QuickTime.qts with Ghidra to look at its contents once again (last time it was for QDesign Music details but luckily I’ve marked SVQ1 decoder functions as well).

The official binary specification turned out to have slightly different design with just one block decoding function that gets intra or inter codebooks passed to it (so intra block is essentially adding residue to zero block using intra codebooks). And, more curiously, the codec uses 16-bit values for pixels up to the very end of decoding.

As you can guess, the artefacts looking like white blocks are caused by the pixel value going out of 8-bit range. I actually hooked GDB script to mplayer2 that loads QuickTime decoder (and presents some garbage instead of proper decoded frame) to see what happens with the block showing such artefact. It turned out that pixel with the original value 0xCF got increased to 0x14F during codebook additions and the reference decoder had output it as 0x4F. So I changed clamping to discarding top bits and it works much better.

Considering that codebooks are stored as single .dll resource and block decoding function works (for performance reasons) as a chain of block modifying functions with stackless calling convention I call the results good enough and let those who want more dig there instead of me.

Done with VGM/XVD

Thursday, February 18th, 2021

Since the time I first looked at XVD-related codecs I dug deeper and at one point considered implementing it all for NihAV. But every time I look at Muzip or some logic inside video decoders I lose all interest. So finally I’ve documented my finds on The Wiki and now I can forget about it and move to something else.

Some of it was easy to investigate since VGM demuxer along with Muzip CTP06/CTP07, Domen and VT2k decoders are present in Java applets that can be easily decompiled. Some like V2K-II or XVD can be easily decompiled with Ghidra and produce mostly understandable code (except for wavelet decoding part in XVD). Muzip4 and VT on the other hand have hard to follow logic. And VGM2 demuxer is available only as DirectShow splitter which is a pain to search for the COM object responsible for the demuxing itself.

Funny enough now Alaris VGPixel looks more related since VT codec has similar mode of compression. Additionally both the official player and demo programs from VGM-XVD developer site use the same trick—they put all .dlls in compressed form (the standard SZDD compression) at the end of executable, which decompresses and loads them at start.

Also it’s worth mentioning that all decoders (except for VGPixel) have the same interface via the functions UCF_InitCodec, UCF_ProcessFrame and such. Anybody interested enough can write his own program that demuxes VGM or VGM2, loads the proper decoder libraries and does something with the result. At least I’ve documented it as much as I could (or cared) so there’s some foundation to start from.

Alaris VGPixel

Sunday, January 31st, 2021

As I mentioned in the previous post, I wanted to look at this codec because it might be related to the whole XVD family. It turned out to be completely different even if it’s from the same company and bit reading is done in the same way (which is a very minor thing).

So the codec itself is a delta codec that reminds me of BMV from Discworld Noir a lot. Essentially it just reads opcodes using static codebook and performs some action depending on them:

  • repeat previous pixel value 1/2/4 times or the fixed amount transmitted in the frame header;
  • skip 1/2/4 pixels or the fixed amount of pixels transmitted in the frame header;
  • copy top/bottom/right/left neighbour pixel;
  • set pixel to the average of top/bottom and right/left neighbour pixels;
  • or decode pixel difference (for all three components at once) and add it to the previous value.

The main problem REing it was figuring out the decoding loop since for performance reasons decoder tries to group opcodes and handle them at once thus creating 200-something cases to handle. Plus those opcode handlers are just pieces of code that work on data in registers and jump to the function that dispatches next opcode. Of course it does not decompile properly but the amount of instructions is small anyway and it’s the same code repeated over and over again.

Maybe I’ll even write a decoder for it sometime later.

Looking at XVD

Saturday, January 30th, 2021

A week ago a certain XviD developer made a request to look at something more compressed called XVD and so I did.

A look at various video codecs from the 90s

Monday, January 18th, 2021

Since I had nothing better to do during Christmas “vacation” (it is the first time I’m in Germany at this time of year so of course I had nothing better to do) I looked at various codecs, mostly from last century, and wrote some notes about more interesting ones. Here I’d like to give some information about the rest lest I forget it completely.

  • Affinity Video—JPEG rip-off;
  • Lsvx—H.263 rip-off with possible raw frames;
  • Morgan TVMJ—I should’ve noticed “MJPEG” in the description sooner;
  • VDOWave 2—an unholy mix of H.263 and wavelets. It uses the coding scheme from H.263 (8×8 blocks, loop filter, halfpel motion compensation and even something suspiciously resembling OBMC) but blocks are coded as three 4×4 blocks that should be recombined using Haar transform into one 8×8 block. Plus there might be an additional enhancement layer for the whole frame based on the same wavelet as well.

And I should mention VSS Codec Light. While it is hard to get through all those levels of C++ abstractions, looks like it has arithmetic coding with static models, 4×4/8×8/16×16 blocks, 8×8 DCT, and five different wavelets variants to boot. At least it’s not another JPEG or H.263 rip-off.

Overall it feels that back in those days you had mostly JPEG rip-offs, H.263 rip-offs and wavelet-based codecs. I tried to look at more of the latter but one of the codecs turned out to be an impenetrable mess of deeply nested calls that seem to add stuff to the lists to be processed later somehow and another codec demonstrated that Ghidra disassembler has bugs in handling certain kinds of instructions involving FS register IIRC. In result it thinks the instruction should be a byte or two longer than it really is. So unless this is fixed I can’t look at it. There are still plenty old codecs I’ve not looked at.

A look at ACT-L2

Saturday, January 9th, 2021

This is yet another video codec from the 90s used for streaming and completely forgotten now. But since I had nothing better to do I decided to look at it as well.

Essentially it is another H.263 rip-off with a twist. From H.263 it took overall codec design (I/P-frames, 8×8 DCT, DC prediction, OBMC) but the data coding is special. For starters, they don’t use any codebooks but rather rely on fixed-width bitfields. And those bit values are not written as they occur but rather packed together into separate arrays. There’s a way to improve compression though: those chunks can be further compressed using binary arithmetic coder with an adaptive model to code bytes (i.e. you have 256 states and you select state depending on which bits you have already decoded).

Additionally it has somewhat different method of coding block coefficients. Instead of usual (zero-run, level, end-of-block) triplets assigned to a single code it uses bit flags to signal that certain block areas (coefficients 0-3, 4-7, 8-11 and 12-63) are coded and for the first three areas it also transmits bit flags to signal that the coefficient is coded. And only the last area uses zero-run + level coding (using explicit bitfields for each).

Overall it’s an interesting idea and reminds me of TM2 or TM2X since those codecs also used data partitioning (and in case of TM2 data compression as well).

ClearVideo briefly revisited

Thursday, December 31st, 2020

Since I had nothing better to do for the rest of this year (I expect the next year to begin in the same fashion) I decided to take a look at the problem when some files were decoded with inter-frames becoming distorted like there’s some sharpening filter constantly applied. And what do you know, there’s some smoothing involved in certain cases.

A quick look on Rududu

Sunday, December 27th, 2020

Since I had nothing better to do I decided to look at Rududu codec. It is one of old more exotic codecs that nobody remembers.

I did not want to look that deep into its details (hence it’s just a quick look) so here are the principles it seems to employ:

  • it seems to employ some integer approximation of wavelet transform (instead of e.g. LeGall 5/3 transform employed by lossless JPEG-2000);
  • it probably has intra- and interframes but it does not employ motion compensation, just coefficients updating;
  • DWT coefficients are quantised (and common bias is removed) with scale and bias calculated for the whole frame;
  • coefficients are coded using quadtree (i.e. some parts of the bands can be left uncoded in addition to skipping the whole DWT subbands);
  • and finally, data is coded using adaptive models for absolute values and bits for both signs and “region coded” flags and the probabilities from these models are fed to the range coder.

So while this codec is nothing outstanding it’s still a nice change from the mainstream video coding approach defined by ITU H.26x codecs.

Vivo2 revisited

Tuesday, December 22nd, 2020

Since I have nothing better to do (after a quick glance at H.264 decoder—yup, nothing) I decided to look at Vivo 2 again to see if I can improve it from being “decoding and somewhat recognizable” to “mostly okay” stage.

To put a long story short, Vivo 2 turned out to be an unholy mix of H.263 and MPEG-4 ASP. On one hoof you have H.263 codec structure, H.263 codebooks and even the unique feature of H.263 called PB-frames. On the other hoof you have coefficient quantisation like in MPEG-4 ASP and coefficient prediction done on unquantised coefficients (H.263 performs DC/AC prediction on already dequantised coefficients while MPEG-4 ASP re-quantises them for the prediction).

And the main weirdness is IDCT. While the older standards give just ideal transform formula, multiplying by matrix is slow and thus most implementations use some (usually fixed-point integer) approximation that also exploits internal symmetry for faster calculation (and hence one of the main problems with various H.263 and DivX-based codecs: if you don’t use the exactly the same transform implementation as the reference you’ll get artefacts because those small differences will accumulate). Actually ITU H.263 Annex W specifies bit-exact transform but nobody cares by this point. And Vivo Video has a different approach altogether: it generates a set of matrices for each coefficient and thus instead of performing IDCT directly it simply sums one or two matrices for each non-zero coefficient (one matrix is for coefficient value modulo 32, another one is for coefficient value which is multiple of 32). Of course it takes account for it being too coarse by multiplying matrices by 64 before converting to integers (and so the resulting block should be scaled down by 64 as well).

In either case it seems to work good enough so I’ve finally enabled nihav-vivo in the list of default crates and can finally forget about it as did the rest of the world.