Archive for the ‘Various Video Codecs’ Category

Looking at true IVF

Thursday, October 13th, 2022

So while russia demonstrates once again what a hysterical führer and his similarly minded generals would do, I’m trying to do something to distract myself from the thoughts about the losses Ukraine suffered in last couple of days (and previous couple hundreds of days too).

Accidentally I’ve managed to find a sample in IVF format—the original streaming format for Indeo codecs. Despite the expectations it was not a renamed AVI file which made it even more interesting. So I took Ghidra and binary specification and started implementing it.

The format by itself is rather simple and the only interesting thing is that it transmits video data in passes: first it’s just bare minimum keyframes and all other frames as drop frames, then some inter frames are sent, then more data is sent and more. Unfortunately there’s no obvious field to tell where each part starts so I simply assemble frames in memory (which is not effective but still better than the original creating temporary files for assembling fragments).

Then there was a problem with decoding them: my existing decoder expected a frame with sequential structure while in this case fragments are transmitted out of order. Normally you’d get all four bands for luma plane and then a band for each chroma planes but in this case it may be one or two bands for luma plane, then chroma plane bands, then some zero padding and then the rest of the bands for luma (if they were transmitted at all). In the reference implementation the demuxer seems to parse and re-assemble the frame while I ended up writing a slightly different decoder for this scalable mode. It seems to work fine so now NihAV support for Indeo family of formats is even more complete than ever.

P.S. The Wiki had only the entry for Duck test samples format (that has the same extension) so somebody had to correct that. Which also makes me think about things like Duck AVC, AVS being an FMV game format or Chinese non-Duck AVC rip-off, IEEE AAC and so on. This confusion deserves to be written about but don’t expect me to do that.

German PEGs

Thursday, September 29th, 2022

Having said everything I could on the current political situation, I returned to looking at random codecs and I found one with a curious name.

The name is DPEG and it’s used in at least one random game I’ve never heard of. It turned out to be a rather simple tile-based codec with raw intra frames and inter frames that employ RLE and motion compensation.

So nothing interesting but then it struck me: wait a bit, I remember REing a codec with almost the same name, block-based RLE coding approach and also from a German company (a different one though).

And there’s this Fraunhofer society involved in MPEG Video and MPEG Audio (different branches though)…

So what’s the German fascination with naming codecs [A-Z]PEG and how many out of 22 possible codecs are really implemented?

A Quick Look at AVS3

Monday, August 15th, 2022

The war has shifted to a terrorist operation against Ukrainian civilians (with no change for my home city, it gets several strikes from russian territory every day regardless) and instead of threatening the world with nuclear war russia threatens the world with nuclear terrorism using the captured Ukrainian nuclear power plant. So here’s yet another attempt to distract myself from thoughts about it.

It seems that AVS3 has been standardised already (and nobody cares). So out of idle curiosity I’ve downloaded the spec from avs.org.cn (in Chinese of course, and it requires you to fill some information but accepted any garbage). So let’s look at this completely original format that has not borrowed anything neither from H.266 nor from AV1.
(more…)

A cursory glance at Daniel codecs

Sunday, July 10th, 2022

Recently Paul B. Mahol turned my attention to the fact there are codecs with human names so why not take a look at them?

The first disappointing thing is that you need to register in order to receive a link for downloading that software. Oh well, throwaway mail services exist and it’s not the first codec I’ve seen doing that (though I’ve forgotten its name already like I’ll forget about this codec). Then it’s the binary size. I remember thinking that Go2Somewhere single 15MB .dll is too much but here’s is even larger (because they’ve decided to bundle a bunch of other decoders and demuxers instead of just one codec).

In either case, what can I say about the codec? Nothing much really. They both seem to be DCT-based intermediate codecs that group blocks into slices. Daniel2 uses larger tiles in slices, probably to accommodate for the wider variety of supported chroma formats (unlike its predecessor it supports different colourspaces, chroma subsamplings and bitdepths). The claimed high (de)coding speed comes from the same approach as in VBLE (does anybody remember it? I still remember how Derek introduced its author to me at one of VDDs). Yes, they simply store coefficients in fixed amount of bits and transmit bit length for each tile to use.

The only really curious thing I’ve found is some combinatorial coding approach I’ve never seen anywhere else. Essentially it stores something like sum in a table and for each value only the number of table entries is transmitted. Actual value is decoded as (max(table[0], ..., table[len - 1]) + min(table[0], ..., table[len - 1]) + 1) / 2 and then the decoded value is subtracted from all table elements used in the calculation. I have no idea why it’s there and what it’s good for but it exists.

Overall, it was not a complete waste of time.

Looking at Aware MotionWavelets

Sunday, December 26th, 2021

I wanted to reverse-engineer and implement some wavelet codec just for the sake of it. And finally I’ve managed to do that.

Initially I wanted to finish Rududu Video codec (I’ve looked at it briefly and one of the funny things is that the opensource release of Rududu Image codec does not match the actual binary specification, even arithmetic coder is different), but it turns out there’re no samples in the usual place so I just picked something that has some samples already.

The codec turned out to employ some tricks so I had to resort to collecting debug information in order to understand band structure (all band dimensions are implicit, you need to know them and the order to decode it all successfully). Then it turned out that band data is coded in boustrophedon order instead of the usual raster scan. And finally there’s fun with scaling: vertical transform is the same as horizontal one but the output is scaled by 128. Beside that it’s rather unremarkable.

Anyway, I got slightly deeper knowledge about the inner workings of wavelet codecs and it should not bother me any longer. It’s time to slack off before doing something else.

Some words on QT Animation (SMC) codec

Tuesday, August 10th, 2021

A recent question about buggy SMC decoding led me deep into QuickTime specification to look at the codec missing opcode. And there are some noteworthy things here as well.

Back in the day there was the multimedia player for Unix called XAnim. Its last release was in 1999—before other opensource multimedia player projects have started! It was both feature-rich (e.g. it could step frames forward and backwards, something that not all current media players can do) and had an excellent codec support for the time.

Somehow its author reverse engineered (long before the era of decompilers too) a lot of codecs and somehow managed to obtain the sources for e.g. Indeo and while he could not provide them, he offered them for a wide variety of architectures—Alpha, MIPS, Sparc, PowerPC, x86. It was a treasure trove for formats and lots of the decoders were ported to other projects (even I did that for one or two codecs) and binary codecs were a great help in reverse-engineering efforts as well.

Now to SMC itself. Formally it’s QuickTime Animation codec but people call it after its FOURCC which is “smc “, probably after the author’s initials.

Opensource SMC decoders come from the same source (I based mine on the description in The Wiki but you can guess what that description is based on; and yes, back in the day e.g. MPlayer and Xine had their own decoders for various codecs before relying on libavcodec for everything). After looking at the binary specification I can say it looks exactly like it was reverse engineered from it directly (it has the same logic and data types but lacks sensible names). Anyway, the thing is that it does not handle opcode 0xF0 and I finally had an occasion to look at it.

I took QuickTime 6.3 binary specification for Windows (somehow the decoder ended in QuickTimeInternetExtras.qtx) and looked inside. It turns out that there are several decoding functions there (for different output formats) but they all do the same: handle 0xF0 opcode in exactly the same way as 0xE0 opcode (raw blocks), there are no differences there whatsoever.

That’s one mystery less, even if the answer is a bit disappointing. At least I could reminisce about good old times hardly anybody else remembers.

About upcoming AV2…

Friday, August 6th, 2021

So today I’ve seen an article titled AV2 Video Codec — Early Performance Evaluation of the Research which of course has drawn my attention.

Fun things are that it is a sponsored article and that it’s written by three engineers from ViCueSoft. This is strange, but so far it still looks more promising than the original AV1 feature review article with over 20 authors and too much marketing in it (my review of it is here; and to be fair it was followed by more serious paper with less authors but this one exists as well). Anyway, let’s see what is presented here.

I don’t care about the performance much so I just quote the phrase from the conclusion: “…rough approximation shows only 1.2x times encoding complexity increase and 1.4x time decoding”. I find the increase in decoding complexity being larger than the increase of encoding complexity a bit strange, normally you’d expect encoding difficulty rising faster because of the nature of the coding approach in modern codecs (normally an encoder needs to search for the best combination of encoding tools and their parameters and then apply the same steps as decoder does in order to have a coded frame in the same state as decoder would have it). Let’s look at the features then, it’s the most interesting part to me anyway.

  • distant weighted compound mode and dual interpolation filter are removed;
  • semi-decoupled partitioning is introduced—this feature allows splitting luma and chroma blocks and code their contents independently under certain level. The paper also says there’s Dual Tree feature in VVC that does the same;
  • quantiser step overhaul—instead of six tables in AV1 now you have just one simple formula for all quantiser step;
  • extending motion sample selection to work with compound blocks as well;
  • more partitioning modes to be more like HEVC;
  • multiple reference line selection for intra prediction—allows you to select not just neighbouring row/column for directional intra prediction. The same tool exists in VVC. And it also reminds me of X8 frames in WMV2/WMV9, that is the first case of intra prediction using more than one line known to me;
  • offset-based intra prediction refinement—adding some offset to the top/left intra predicted edge of the block to make it even smoother (the offset is calculated from the neighbouring blocks as well);
  • intra secondary transform—this tool tries to improve compression by applying a special secondary transform to the low-frequency coefficients. VVC has low-frequency non separable transform doing the same;
  • simplifications in intra mode signalling;
  • some improvements in motion prediction coding;
  • cross-component sample offset—another chroma-from-luma tool: for the whole CTU between deblocking and CDEF stages a DC offset is calculated from the luma values and applied to chroma values.

Essentially there are three kinds of improvements: simplification or generalisation of the existing feature (including complete removal of it—I approve either), picking the tool used by VVC/H.266 (that approach works but lacks originality) and an occasional improvement of an existing tool (too few and not too original). Of course nobody knows when AV2 will be declared finished and some things will surely have changed by then, but I don’t expect radical changes.

Once I said that I’ll review H.266 when AV2 is released but these guys has essentially done my work instead of me. Thanks!

A quick look on movies for handhelds

Sunday, March 21st, 2021

In not-exactly-recent news there was a piece about some guy who decided not to listen to the advice of a director of some blockbuster and instead of going to cinema to watch it he encoded it to watch on Game Boy cartridges instead. While people doing stupid things is hardly news, it sparked a mild interest in me so I looked what are the options on underpowered hardware for storing video.

It turned out there are at least three formats for coding not just cutscenes but whole movies (or at least episodes of various series) to fit into 32MB GBA cartridge. And those three formats seem to be built on vector quantisation and they all embed video into the player program (well, the cartridge in this case does not have segments or filesystem for different resources).

  • GBA Video is probably the most famous and the most official one (there were official releases of couple dozens animated movies and cartoon series that used the format). It’s been developed by Majesco and it seems to use vector quantisation and deflate and since it checks codebook size to be 256*6, it’s most likely to be something like Cinepak using 2×2 YUV 420 codebook entries for compression. Additionally it seems to use left prediction (i.e. code pixel as a difference to the left one);
  • Caiman video codec seemed to come in two flavours, the original one coding 8×8 blocks using either four 4×4 pixel codebooks or just one scaled (that reminds me of Cinepak again for some reason, maybe because it did the same albeit using 2×2 vectors), next version of the codec introduced codebooks of different sizes and 8×8 block could be split recursively for that (also that version got motion compensation);
  • METEO is some Japanese format that seems to be the choice for the GBA enthusiasts since there’s a free encoder for it. I actually looked into it to see what it does (it’s a standalone binary about two hundred kilobytes large) and it turns out to decode input videos using standard Windows interfaces and encode frames with Cinepak encoder and write them into their own container.

All these formats make me think that if I look at other gaming consoles I can find Cinepak there as well. Let’s look what those FMV games used

Curiosity satisfied, I should move to something else.

Fixing SVQ1 decoding bug

Saturday, March 6th, 2021

In the comments to the previous post a certain Paul B. pointed out that SVQ1 decoder (the one in libavcodec or mine) decodes certain files with visual artefacts. So I opened the old dreary QuickTime.qts with Ghidra to look at its contents once again (last time it was for QDesign Music details but luckily I’ve marked SVQ1 decoder functions as well).

The official binary specification turned out to have slightly different design with just one block decoding function that gets intra or inter codebooks passed to it (so intra block is essentially adding residue to zero block using intra codebooks). And, more curiously, the codec uses 16-bit values for pixels up to the very end of decoding.

As you can guess, the artefacts looking like white blocks are caused by the pixel value going out of 8-bit range. I actually hooked GDB script to mplayer2 that loads QuickTime decoder (and presents some garbage instead of proper decoded frame) to see what happens with the block showing such artefact. It turned out that pixel with the original value 0xCF got increased to 0x14F during codebook additions and the reference decoder had output it as 0x4F. So I changed clamping to discarding top bits and it works much better.

Considering that codebooks are stored as single .dll resource and block decoding function works (for performance reasons) as a chain of block modifying functions with stackless calling convention I call the results good enough and let those who want more dig there instead of me.

Done with VGM/XVD

Thursday, February 18th, 2021

Since the time I first looked at XVD-related codecs I dug deeper and at one point considered implementing it all for NihAV. But every time I look at Muzip or some logic inside video decoders I lose all interest. So finally I’ve documented my finds on The Wiki and now I can forget about it and move to something else.

Some of it was easy to investigate since VGM demuxer along with Muzip CTP06/CTP07, Domen and VT2k decoders are present in Java applets that can be easily decompiled. Some like V2K-II or XVD can be easily decompiled with Ghidra and produce mostly understandable code (except for wavelet decoding part in XVD). Muzip4 and VT on the other hand have hard to follow logic. And VGM2 demuxer is available only as DirectShow splitter which is a pain to search for the COM object responsible for the demuxing itself.

Funny enough now Alaris VGPixel looks more related since VT codec has similar mode of compression. Additionally both the official player and demo programs from VGM-XVD developer site use the same trick—they put all .dlls in compressed form (the standard SZDD compression) at the end of executable, which decompresses and loads them at start.

Also it’s worth mentioning that all decoders (except for VGPixel) have the same interface via the functions UCF_InitCodec, UCF_ProcessFrame and such. Anybody interested enough can write his own program that demuxes VGM or VGM2, loads the proper decoder libraries and does something with the result. At least I’ve documented it as much as I could (or cared) so there’s some foundation to start from.