Archive for the ‘Various Video Codecs’ Category

Looking at Aware MotionWavelets

Sunday, December 26th, 2021

I wanted to reverse-engineer and implement some wavelet codec just for the sake of it. And finally I’ve managed to do that.

Initially I wanted to finish Rududu Video codec (I’ve looked at it briefly and one of the funny things is that the opensource release of Rududu Image codec does not match the actual binary specification, even arithmetic coder is different), but it turns out there’re no samples in the usual place so I just picked something that has some samples already.

The codec turned out to employ some tricks so I had to resort to collecting debug information in order to understand band structure (all band dimensions are implicit, you need to know them and the order to decode it all successfully). Then it turned out that band data is coded in boustrophedon order instead of the usual raster scan. And finally there’s fun with scaling: vertical transform is the same as horizontal one but the output is scaled by 128. Beside that it’s rather unremarkable.

Anyway, I got slightly deeper knowledge about the inner workings of wavelet codecs and it should not bother me any longer. It’s time to slack off before doing something else.

Some words on QT Animation (SMC) codec

Tuesday, August 10th, 2021

A recent question about buggy SMC decoding led me deep into QuickTime specification to look at the codec missing opcode. And there are some noteworthy things here as well.

Back in the day there was the multimedia player for Unix called XAnim. Its last release was in 1999—before other opensource multimedia player projects have started! It was both feature-rich (e.g. it could step frames forward and backwards, something that not all current media players can do) and had an excellent codec support for the time.

Somehow its author reverse engineered (long before the era of decompilers too) a lot of codecs and somehow managed to obtain the sources for e.g. Indeo and while he could not provide them, he offered them for a wide variety of architectures—Alpha, MIPS, Sparc, PowerPC, x86. It was a treasure trove for formats and lots of the decoders were ported to other projects (even I did that for one or two codecs) and binary codecs were a great help in reverse-engineering efforts as well.

Now to SMC itself. Formally it’s QuickTime Animation codec but people call it after its FOURCC which is “smc “, probably after the author’s initials.

Opensource SMC decoders come from the same source (I based mine on the description in The Wiki but you can guess what that description is based on; and yes, back in the day e.g. MPlayer and Xine had their own decoders for various codecs before relying on libavcodec for everything). After looking at the binary specification I can say it looks exactly like it was reverse engineered from it directly (it has the same logic and data types but lacks sensible names). Anyway, the thing is that it does not handle opcode 0xF0 and I finally had an occasion to look at it.

I took QuickTime 6.3 binary specification for Windows (somehow the decoder ended in QuickTimeInternetExtras.qtx) and looked inside. It turns out that there are several decoding functions there (for different output formats) but they all do the same: handle 0xF0 opcode in exactly the same way as 0xE0 opcode (raw blocks), there are no differences there whatsoever.

That’s one mystery less, even if the answer is a bit disappointing. At least I could reminisce about good old times hardly anybody else remembers.

About upcoming AV2…

Friday, August 6th, 2021

So today I’ve seen an article titled AV2 Video Codec — Early Performance Evaluation of the Research which of course has drawn my attention.

Fun things are that it is a sponsored article and that it’s written by three engineers from ViCueSoft. This is strange, but so far it still looks more promising than the original AV1 feature review article with over 20 authors and too much marketing in it (my review of it is here; and to be fair it was followed by more serious paper with less authors but this one exists as well). Anyway, let’s see what is presented here.

I don’t care about the performance much so I just quote the phrase from the conclusion: “…rough approximation shows only 1.2x times encoding complexity increase and 1.4x time decoding”. I find the increase in decoding complexity being larger than the increase of encoding complexity a bit strange, normally you’d expect encoding difficulty rising faster because of the nature of the coding approach in modern codecs (normally an encoder needs to search for the best combination of encoding tools and their parameters and then apply the same steps as decoder does in order to have a coded frame in the same state as decoder would have it). Let’s look at the features then, it’s the most interesting part to me anyway.

  • distant weighted compound mode and dual interpolation filter are removed;
  • semi-decoupled partitioning is introduced—this feature allows splitting luma and chroma blocks and code their contents independently under certain level. The paper also says there’s Dual Tree feature in VVC that does the same;
  • quantiser step overhaul—instead of six tables in AV1 now you have just one simple formula for all quantiser step;
  • extending motion sample selection to work with compound blocks as well;
  • more partitioning modes to be more like HEVC;
  • multiple reference line selection for intra prediction—allows you to select not just neighbouring row/column for directional intra prediction. The same tool exists in VVC. And it also reminds me of X8 frames in WMV2/WMV9, that is the first case of intra prediction using more than one line known to me;
  • offset-based intra prediction refinement—adding some offset to the top/left intra predicted edge of the block to make it even smoother (the offset is calculated from the neighbouring blocks as well);
  • intra secondary transform—this tool tries to improve compression by applying a special secondary transform to the low-frequency coefficients. VVC has low-frequency non separable transform doing the same;
  • simplifications in intra mode signalling;
  • some improvements in motion prediction coding;
  • cross-component sample offset—another chroma-from-luma tool: for the whole CTU between deblocking and CDEF stages a DC offset is calculated from the luma values and applied to chroma values.

Essentially there are three kinds of improvements: simplification or generalisation of the existing feature (including complete removal of it—I approve either), picking the tool used by VVC/H.266 (that approach works but lacks originality) and an occasional improvement of an existing tool (too few and not too original). Of course nobody knows when AV2 will be declared finished and some things will surely have changed by then, but I don’t expect radical changes.

Once I said that I’ll review H.266 when AV2 is released but these guys has essentially done my work instead of me. Thanks!

A quick look on movies for handhelds

Sunday, March 21st, 2021

In not-exactly-recent news there was a piece about some guy who decided not to listen to the advice of a director of some blockbuster and instead of going to cinema to watch it he encoded it to watch on Game Boy cartridges instead. While people doing stupid things is hardly news, it sparked a mild interest in me so I looked what are the options on underpowered hardware for storing video.

It turned out there are at least three formats for coding not just cutscenes but whole movies (or at least episodes of various series) to fit into 32MB GBA cartridge. And those three formats seem to be built on vector quantisation and they all embed video into the player program (well, the cartridge in this case does not have segments or filesystem for different resources).

  • GBA Video is probably the most famous and the most official one (there were official releases of couple dozens animated movies and cartoon series that used the format). It’s been developed by Majesco and it seems to use vector quantisation and deflate and since it checks codebook size to be 256*6, it’s most likely to be something like Cinepak using 2×2 YUV 420 codebook entries for compression. Additionally it seems to use left prediction (i.e. code pixel as a difference to the left one);
  • Caiman video codec seemed to come in two flavours, the original one coding 8×8 blocks using either four 4×4 pixel codebooks or just one scaled (that reminds me of Cinepak again for some reason, maybe because it did the same albeit using 2×2 vectors), next version of the codec introduced codebooks of different sizes and 8×8 block could be split recursively for that (also that version got motion compensation);
  • METEO is some Japanese format that seems to be the choice for the GBA enthusiasts since there’s a free encoder for it. I actually looked into it to see what it does (it’s a standalone binary about two hundred kilobytes large) and it turns out to decode input videos using standard Windows interfaces and encode frames with Cinepak encoder and write them into their own container.

All these formats make me think that if I look at other gaming consoles I can find Cinepak there as well. Let’s look what those FMV games used

Curiosity satisfied, I should move to something else.

Fixing SVQ1 decoding bug

Saturday, March 6th, 2021

In the comments to the previous post a certain Paul B. pointed out that SVQ1 decoder (the one in libavcodec or mine) decodes certain files with visual artefacts. So I opened the old dreary QuickTime.qts with Ghidra to look at its contents once again (last time it was for QDesign Music details but luckily I’ve marked SVQ1 decoder functions as well).

The official binary specification turned out to have slightly different design with just one block decoding function that gets intra or inter codebooks passed to it (so intra block is essentially adding residue to zero block using intra codebooks). And, more curiously, the codec uses 16-bit values for pixels up to the very end of decoding.

As you can guess, the artefacts looking like white blocks are caused by the pixel value going out of 8-bit range. I actually hooked GDB script to mplayer2 that loads QuickTime decoder (and presents some garbage instead of proper decoded frame) to see what happens with the block showing such artefact. It turned out that pixel with the original value 0xCF got increased to 0x14F during codebook additions and the reference decoder had output it as 0x4F. So I changed clamping to discarding top bits and it works much better.

Considering that codebooks are stored as single .dll resource and block decoding function works (for performance reasons) as a chain of block modifying functions with stackless calling convention I call the results good enough and let those who want more dig there instead of me.

Done with VGM/XVD

Thursday, February 18th, 2021

Since the time I first looked at XVD-related codecs I dug deeper and at one point considered implementing it all for NihAV. But every time I look at Muzip or some logic inside video decoders I lose all interest. So finally I’ve documented my finds on The Wiki and now I can forget about it and move to something else.

Some of it was easy to investigate since VGM demuxer along with Muzip CTP06/CTP07, Domen and VT2k decoders are present in Java applets that can be easily decompiled. Some like V2K-II or XVD can be easily decompiled with Ghidra and produce mostly understandable code (except for wavelet decoding part in XVD). Muzip4 and VT on the other hand have hard to follow logic. And VGM2 demuxer is available only as DirectShow splitter which is a pain to search for the COM object responsible for the demuxing itself.

Funny enough now Alaris VGPixel looks more related since VT codec has similar mode of compression. Additionally both the official player and demo programs from VGM-XVD developer site use the same trick—they put all .dlls in compressed form (the standard SZDD compression) at the end of executable, which decompresses and loads them at start.

Also it’s worth mentioning that all decoders (except for VGPixel) have the same interface via the functions UCF_InitCodec, UCF_ProcessFrame and such. Anybody interested enough can write his own program that demuxes VGM or VGM2, loads the proper decoder libraries and does something with the result. At least I’ve documented it as much as I could (or cared) so there’s some foundation to start from.

Alaris VGPixel

Sunday, January 31st, 2021

As I mentioned in the previous post, I wanted to look at this codec because it might be related to the whole XVD family. It turned out to be completely different even if it’s from the same company and bit reading is done in the same way (which is a very minor thing).

So the codec itself is a delta codec that reminds me of BMV from Discworld Noir a lot. Essentially it just reads opcodes using static codebook and performs some action depending on them:

  • repeat previous pixel value 1/2/4 times or the fixed amount transmitted in the frame header;
  • skip 1/2/4 pixels or the fixed amount of pixels transmitted in the frame header;
  • copy top/bottom/right/left neighbour pixel;
  • set pixel to the average of top/bottom and right/left neighbour pixels;
  • or decode pixel difference (for all three components at once) and add it to the previous value.

The main problem REing it was figuring out the decoding loop since for performance reasons decoder tries to group opcodes and handle them at once thus creating 200-something cases to handle. Plus those opcode handlers are just pieces of code that work on data in registers and jump to the function that dispatches next opcode. Of course it does not decompile properly but the amount of instructions is small anyway and it’s the same code repeated over and over again.

Maybe I’ll even write a decoder for it sometime later.

Looking at XVD

Saturday, January 30th, 2021

A week ago a certain XviD developer made a request to look at something more compressed called XVD and so I did.

A look at various video codecs from the 90s

Monday, January 18th, 2021

Since I had nothing better to do during Christmas “vacation” (it is the first time I’m in Germany at this time of year so of course I had nothing better to do) I looked at various codecs, mostly from last century, and wrote some notes about more interesting ones. Here I’d like to give some information about the rest lest I forget it completely.

  • Affinity Video—JPEG rip-off;
  • Lsvx—H.263 rip-off with possible raw frames;
  • Morgan TVMJ—I should’ve noticed “MJPEG” in the description sooner;
  • VDOWave 2—an unholy mix of H.263 and wavelets. It uses the coding scheme from H.263 (8×8 blocks, loop filter, halfpel motion compensation and even something suspiciously resembling OBMC) but blocks are coded as three 4×4 blocks that should be recombined using Haar transform into one 8×8 block. Plus there might be an additional enhancement layer for the whole frame based on the same wavelet as well.

And I should mention VSS Codec Light. While it is hard to get through all those levels of C++ abstractions, looks like it has arithmetic coding with static models, 4×4/8×8/16×16 blocks, 8×8 DCT, and five different wavelets variants to boot. At least it’s not another JPEG or H.263 rip-off.

Overall it feels that back in those days you had mostly JPEG rip-offs, H.263 rip-offs and wavelet-based codecs. I tried to look at more of the latter but one of the codecs turned out to be an impenetrable mess of deeply nested calls that seem to add stuff to the lists to be processed later somehow and another codec demonstrated that Ghidra disassembler has bugs in handling certain kinds of instructions involving FS register IIRC. In result it thinks the instruction should be a byte or two longer than it really is. So unless this is fixed I can’t look at it. There are still plenty old codecs I’ve not looked at.

A look at ACT-L2

Saturday, January 9th, 2021

This is yet another video codec from the 90s used for streaming and completely forgotten now. But since I had nothing better to do I decided to look at it as well.

Essentially it is another H.263 rip-off with a twist. From H.263 it took overall codec design (I/P-frames, 8×8 DCT, DC prediction, OBMC) but the data coding is special. For starters, they don’t use any codebooks but rather rely on fixed-width bitfields. And those bit values are not written as they occur but rather packed together into separate arrays. There’s a way to improve compression though: those chunks can be further compressed using binary arithmetic coder with an adaptive model to code bytes (i.e. you have 256 states and you select state depending on which bits you have already decoded).

Additionally it has somewhat different method of coding block coefficients. Instead of usual (zero-run, level, end-of-block) triplets assigned to a single code it uses bit flags to signal that certain block areas (coefficients 0-3, 4-7, 8-11 and 12-63) are coded and for the first three areas it also transmits bit flags to signal that the coefficient is coded. And only the last area uses zero-run + level coding (using explicit bitfields for each).

Overall it’s an interesting idea and reminds me of TM2 or TM2X since those codecs also used data partitioning (and in case of TM2 data compression as well).