Archive for the ‘Various Video Codecs’ Category

Fixing SVQ1 decoding bug

Saturday, March 6th, 2021

In the comments to the previous post a certain Paul B. pointed out that SVQ1 decoder (the one in libavcodec or mine) decodes certain files with visual artefacts. So I opened the old dreary QuickTime.qts with Ghidra to look at its contents once again (last time it was for QDesign Music details but luckily I’ve marked SVQ1 decoder functions as well).

The official binary specification turned out to have slightly different design with just one block decoding function that gets intra or inter codebooks passed to it (so intra block is essentially adding residue to zero block using intra codebooks). And, more curiously, the codec uses 16-bit values for pixels up to the very end of decoding.

As you can guess, the artefacts looking like white blocks are caused by the pixel value going out of 8-bit range. I actually hooked GDB script to mplayer2 that loads QuickTime decoder (and presents some garbage instead of proper decoded frame) to see what happens with the block showing such artefact. It turned out that pixel with the original value 0xCF got increased to 0x14F during codebook additions and the reference decoder had output it as 0x4F. So I changed clamping to discarding top bits and it works much better.

Considering that codebooks are stored as single .dll resource and block decoding function works (for performance reasons) as a chain of block modifying functions with stackless calling convention I call the results good enough and let those who want more dig there instead of me.

Done with VGM/XVD

Thursday, February 18th, 2021

Since the time I first looked at XVD-related codecs I dug deeper and at one point considered implementing it all for NihAV. But every time I look at Muzip or some logic inside video decoders I lose all interest. So finally I’ve documented my finds on The Wiki and now I can forget about it and move to something else.

Some of it was easy to investigate since VGM demuxer along with Muzip CTP06/CTP07, Domen and VT2k decoders are present in Java applets that can be easily decompiled. Some like V2K-II or XVD can be easily decompiled with Ghidra and produce mostly understandable code (except for wavelet decoding part in XVD). Muzip4 and VT on the other hand have hard to follow logic. And VGM2 demuxer is available only as DirectShow splitter which is a pain to search for the COM object responsible for the demuxing itself.

Funny enough now Alaris VGPixel looks more related since VT codec has similar mode of compression. Additionally both the official player and demo programs from VGM-XVD developer site use the same trick—they put all .dlls in compressed form (the standard SZDD compression) at the end of executable, which decompresses and loads them at start.

Also it’s worth mentioning that all decoders (except for VGPixel) have the same interface via the functions UCF_InitCodec, UCF_ProcessFrame and such. Anybody interested enough can write his own program that demuxes VGM or VGM2, loads the proper decoder libraries and does something with the result. At least I’ve documented it as much as I could (or cared) so there’s some foundation to start from.

Alaris VGPixel

Sunday, January 31st, 2021

As I mentioned in the previous post, I wanted to look at this codec because it might be related to the whole XVD family. It turned out to be completely different even if it’s from the same company and bit reading is done in the same way (which is a very minor thing).

So the codec itself is a delta codec that reminds me of BMV from Discworld Noir a lot. Essentially it just reads opcodes using static codebook and performs some action depending on them:

  • repeat previous pixel value 1/2/4 times or the fixed amount transmitted in the frame header;
  • skip 1/2/4 pixels or the fixed amount of pixels transmitted in the frame header;
  • copy top/bottom/right/left neighbour pixel;
  • set pixel to the average of top/bottom and right/left neighbour pixels;
  • or decode pixel difference (for all three components at once) and add it to the previous value.

The main problem REing it was figuring out the decoding loop since for performance reasons decoder tries to group opcodes and handle them at once thus creating 200-something cases to handle. Plus those opcode handlers are just pieces of code that work on data in registers and jump to the function that dispatches next opcode. Of course it does not decompile properly but the amount of instructions is small anyway and it’s the same code repeated over and over again.

Maybe I’ll even write a decoder for it sometime later.

Looking at XVD

Saturday, January 30th, 2021

A week ago a certain XviD developer made a request to look at something more compressed called XVD and so I did.
(more…)

A look at various video codecs from the 90s

Monday, January 18th, 2021

Since I had nothing better to do during Christmas “vacation” (it is the first time I’m in Germany at this time of year so of course I had nothing better to do) I looked at various codecs, mostly from last century, and wrote some notes about more interesting ones. Here I’d like to give some information about the rest lest I forget it completely.

  • Affinity Video—JPEG rip-off;
  • Lsvx—H.263 rip-off with possible raw frames;
  • Morgan TVMJ—I should’ve noticed “MJPEG” in the description sooner;
  • VDOWave 2—an unholy mix of H.263 and wavelets. It uses the coding scheme from H.263 (8×8 blocks, loop filter, halfpel motion compensation and even something suspiciously resembling OBMC) but blocks are coded as three 4×4 blocks that should be recombined using Haar transform into one 8×8 block. Plus there might be an additional enhancement layer for the whole frame based on the same wavelet as well.

And I should mention VSS Codec Light. While it is hard to get through all those levels of C++ abstractions, looks like it has arithmetic coding with static models, 4×4/8×8/16×16 blocks, 8×8 DCT, and five different wavelets variants to boot. At least it’s not another JPEG or H.263 rip-off.

Overall it feels that back in those days you had mostly JPEG rip-offs, H.263 rip-offs and wavelet-based codecs. I tried to look at more of the latter but one of the codecs turned out to be an impenetrable mess of deeply nested calls that seem to add stuff to the lists to be processed later somehow and another codec demonstrated that Ghidra disassembler has bugs in handling certain kinds of instructions involving FS register IIRC. In result it thinks the instruction should be a byte or two longer than it really is. So unless this is fixed I can’t look at it. There are still plenty old codecs I’ve not looked at.

A look at ACT-L2

Saturday, January 9th, 2021

This is yet another video codec from the 90s used for streaming and completely forgotten now. But since I had nothing better to do I decided to look at it as well.

Essentially it is another H.263 rip-off with a twist. From H.263 it took overall codec design (I/P-frames, 8×8 DCT, DC prediction, OBMC) but the data coding is special. For starters, they don’t use any codebooks but rather rely on fixed-width bitfields. And those bit values are not written as they occur but rather packed together into separate arrays. There’s a way to improve compression though: those chunks can be further compressed using binary arithmetic coder with an adaptive model to code bytes (i.e. you have 256 states and you select state depending on which bits you have already decoded).

Additionally it has somewhat different method of coding block coefficients. Instead of usual (zero-run, level, end-of-block) triplets assigned to a single code it uses bit flags to signal that certain block areas (coefficients 0-3, 4-7, 8-11 and 12-63) are coded and for the first three areas it also transmits bit flags to signal that the coefficient is coded. And only the last area uses zero-run + level coding (using explicit bitfields for each).

Overall it’s an interesting idea and reminds me of TM2 or TM2X since those codecs also used data partitioning (and in case of TM2 data compression as well).

ClearVideo briefly revisited

Thursday, December 31st, 2020

Since I had nothing better to do for the rest of this year (I expect the next year to begin in the same fashion) I decided to take a look at the problem when some files were decoded with inter-frames becoming distorted like there’s some sharpening filter constantly applied. And what do you know, there’s some smoothing involved in certain cases.
(more…)

A quick look on Rududu

Sunday, December 27th, 2020

Since I had nothing better to do I decided to look at Rududu codec. It is one of old more exotic codecs that nobody remembers.

I did not want to look that deep into its details (hence it’s just a quick look) so here are the principles it seems to employ:

  • it seems to employ some integer approximation of wavelet transform (instead of e.g. LeGall 5/3 transform employed by lossless JPEG-2000);
  • it probably has intra- and interframes but it does not employ motion compensation, just coefficients updating;
  • DWT coefficients are quantised (and common bias is removed) with scale and bias calculated for the whole frame;
  • coefficients are coded using quadtree (i.e. some parts of the bands can be left uncoded in addition to skipping the whole DWT subbands);
  • and finally, data is coded using adaptive models for absolute values and bits for both signs and “region coded” flags and the probabilities from these models are fed to the range coder.

So while this codec is nothing outstanding it’s still a nice change from the mainstream video coding approach defined by ITU H.26x codecs.

Vivo2 revisited

Tuesday, December 22nd, 2020

Since I have nothing better to do (after a quick glance at H.264 decoder—yup, nothing) I decided to look at Vivo 2 again to see if I can improve it from being “decoding and somewhat recognizable” to “mostly okay” stage.

To put a long story short, Vivo 2 turned out to be an unholy mix of H.263 and MPEG-4 ASP. On one hoof you have H.263 codec structure, H.263 codebooks and even the unique feature of H.263 called PB-frames. On the other hoof you have coefficient quantisation like in MPEG-4 ASP and coefficient prediction done on unquantised coefficients (H.263 performs DC/AC prediction on already dequantised coefficients while MPEG-4 ASP re-quantises them for the prediction).

And the main weirdness is IDCT. While the older standards give just ideal transform formula, multiplying by matrix is slow and thus most implementations use some (usually fixed-point integer) approximation that also exploits internal symmetry for faster calculation (and hence one of the main problems with various H.263 and DivX-based codecs: if you don’t use the exactly the same transform implementation as the reference you’ll get artefacts because those small differences will accumulate). Actually ITU H.263 Annex W specifies bit-exact transform but nobody cares by this point. And Vivo Video has a different approach altogether: it generates a set of matrices for each coefficient and thus instead of performing IDCT directly it simply sums one or two matrices for each non-zero coefficient (one matrix is for coefficient value modulo 32, another one is for coefficient value which is multiple of 32). Of course it takes account for it being too coarse by multiplying matrices by 64 before converting to integers (and so the resulting block should be scaled down by 64 as well).

In either case it seems to work good enough so I’ve finally enabled nihav-vivo in the list of default crates and can finally forget about it as did the rest of the world.

H.264 specification sucks

Saturday, November 14th, 2020

So it has come to a stage where I have nothing better to do so I tried to write H.264 decoder for NihAV (so I can test the future nihav-player with the content beside just sample files and cutscenes from various games). And while I’ve managed to decode at least something (more about that in the end) the specification for H.264 sucks. Don’t get me wrong, the format by itself is not that bad in design but the way it’s documented is far from being good (though it’s still serviceable—it’s not an audio codec after all).

And in the beginning to those who want to cry “but it’s GNU/Linux, err, MPEG/AVC”. ITU H.264 was standardised in May 2003 while MPEG-4 Part 10 came in December 2003. Second, I can download ITU specification freely and various editions too while MPEG standard still costs money I’m not going to pay.

I guess the main problems of H.264 come from two things: dual coding nature (i.e. slice data can be coded using variable-length codes or binary arithmetic coder) and extensions (not as bad as H.263 but approaching it; and here’s a simple fact to demonstrate it—2003 edition had 282 pages, 2019 edition has 836 pages). Plus the fact that is codified the wrong name for Elias gamma’ codes I ranted on before.

Let’s start with the extensions part since most of them can be ignored and I don’t have much to say about them except for one thing—profiles. By itself the idea is good: you have certain set of constraints and features associated with the ID so you know in advance if you should be able to handle the stream or not. And the initial 2003 edition had three profiles (baseline/main/extended) with IDs associated with them (66, 77 and 88 correspondingly). By 2019 there have been a dozen of various profiles and even more profile IDs and they’re not actually mapped one to one (e.g. constrained baseline profile is baseline profile with an additional constraint_set1_flag set to one). In result you have lots of random profile IDs (can you guess what profile_idc 44 means? and 86? or 128?) and they did not bother to make a table listing all known profile IDs so you need to search all specification is order to find out what they mean. I’d not care much but they affect bitstream parsing, especially sequence parameter set where they decided to insert some additional fields in the middle for certain high profiles.

Now the more exciting part: coding. While I understand the rationale (you have simpler and faster or slower but more effective (de)coding mode while using the same ways to transform data) it created some problems for describing it. Because of that decision you have to look at three different places in order to understand what and how to decode: syntax tables in 7.3 which present in which order and under which conditions elements are coded, semantics in 7.4 telling you what that element actually means and what limitations or values it has, and 9.2 or 9.3 for explanations on how certain element should be actually decoded from the bitstream. And confusingly enough coded block pattern is put into 9.1.2 while it would be more logical to join it with 9.2, as 9.1 is for parsing generic codes used not just in slice data but various headers as well and 9.2 deals with parsing custom codes for non-CABAC slice data.

And it gets even worse for CABAC parsing. For those who don’t know what it is, that abbreviation means context-adaptive binary arithmetic coding. In other words it represents various values as sequences of bits and codes each bit using its own context. And if you ask yourself how the values are represented and which contexts are used for each bit then you point right at the problem. In the standard you have it all spread in three or four places: one table to tell you which range of contexts to use for a certain element, some description or separate table for the possible bit strings, another table or two to tell you which contexts should be used for each bit in various cases (e.g. for ctxIdxOffset=36 you have these context offsets for following bits: 0, 1, (2 or 3), 3, 3, 3), and finally an entry that tells you how to select a context for the first bit if it depends on already decoded data (usually by checking if top and left (macro)blocks have the same thing coded or not). Of course it’s especially fun when different bit contexts are reused for different bit positions or the same bit positions can have different contexts depending on previously decoded bit string (this happens mostly for macroblock types in P/SP/B-slices but it’s still confusing). My guess is that they tried to optimise the total number of contexts and thus merged the least used ones. In result you about 20 pages of context data initialisation in the 2019 edition (in initial edition of both H.264 and H.EVC it’s just eight pages)—compare that to almost hundred pages of default CDFs in AV1 specification. And CABAC part in H.265 is somehow much easier to comprehend (probably because they made the format less dependent on special bit strings and put some of the simpler conditions straight into binarisation table).

To me it seems that people describing CABAC coding (not the coder itself but rather how it’s used to code data) did not understand it well themselves (or at least could not convey the meaning clearly). And despite the principle of documenting format from decoder point of view (i.e. what bits should it read and how to act on them in order to decode bitstream) a lot of CABAC coding is documented from encoder point of view (i.e. what bits you should write for syntax element instead of what reading certain bits would produce). An egregious example of that is so-called UEGk binarisation. In addition to the things mentioned above it also has rather meaningless parameter name uCoff (which normally would be called something like escape value). How would I describe decoding it: read truncated unary sequence up to escape_len, if the read value is equal to escape_len then read an additional escape value as exp-Golomb code shifted by k and trailing k-bit value, otherwise escape value is set to zero. Add escape value to the initial one and if the value is non-zero and should be signed, read the sign. Section 9.2.3.2 spends a whole page on it with a third of it being C code for writing the value.

I hope I made it clear why H.264 specification sucks in my opinion. Again, the format itself is logical but comprehending certain parts of the specification describing it takes significantly more time than it should and I wanted to point out why. It was still possible to write a decoder using mostly the specification and referring to other decoders source code only when it was completely unclear or worked against expectations (and JM is still not the best codebase to look at either, HM got much better in that aspect).

P.S. For those zero people who care about NihAV decoder, I’ve managed to decode two random videos downloaded from BaidUTube (funny how one of them turned out to be simple CAVLC-coded video with no B-frames) without B-frames and without apparent artefacts in first hundred frames. There’s still a lot of work to make it decode data correctly (currently it lacks even loop filter and probably still has bugs) plus beside dreaded B-frames with their co-located MVs there are still some features like 8×8 DCTs or high-bitdepth support I’d like to have (but definitely no interlaced support or scalable/multiview shit). It should be good enough to play content I care about and that’s all, I do not want to waste extremely much time making it a perfect software that supports all possible H.264/AVC features and being the fastest one too.