Archive for the ‘NihAV’ Category

Fixing SVQ1 decoding bug

Saturday, March 6th, 2021

In the comments to the previous post a certain Paul B. pointed out that SVQ1 decoder (the one in libavcodec or mine) decodes certain files with visual artefacts. So I opened the old dreary QuickTime.qts with Ghidra to look at its contents once again (last time it was for QDesign Music details but luckily I’ve marked SVQ1 decoder functions as well).

The official binary specification turned out to have slightly different design with just one block decoding function that gets intra or inter codebooks passed to it (so intra block is essentially adding residue to zero block using intra codebooks). And, more curiously, the codec uses 16-bit values for pixels up to the very end of decoding.

As you can guess, the artefacts looking like white blocks are caused by the pixel value going out of 8-bit range. I actually hooked GDB script to mplayer2 that loads QuickTime decoder (and presents some garbage instead of proper decoded frame) to see what happens with the block showing such artefact. It turned out that pixel with the original value 0xCF got increased to 0x14F during codebook additions and the reference decoder had output it as 0x4F. So I changed clamping to discarding top bits and it works much better.

Considering that codebooks are stored as single .dll resource and block decoding function works (for performance reasons) as a chain of block modifying functions with stackless calling convention I call the results good enough and let those who want more dig there instead of me.

ClearVideo briefly revisited

Thursday, December 31st, 2020

Since I had nothing better to do for the rest of this year (I expect the next year to begin in the same fashion) I decided to take a look at the problem when some files were decoded with inter-frames becoming distorted like there’s some sharpening filter constantly applied. And what do you know, there’s some smoothing involved in certain cases.
(more…)

Vivo2 revisited

Tuesday, December 22nd, 2020

Since I have nothing better to do (after a quick glance at H.264 decoder—yup, nothing) I decided to look at Vivo 2 again to see if I can improve it from being “decoding and somewhat recognizable” to “mostly okay” stage.

To put a long story short, Vivo 2 turned out to be an unholy mix of H.263 and MPEG-4 ASP. On one hoof you have H.263 codec structure, H.263 codebooks and even the unique feature of H.263 called PB-frames. On the other hoof you have coefficient quantisation like in MPEG-4 ASP and coefficient prediction done on unquantised coefficients (H.263 performs DC/AC prediction on already dequantised coefficients while MPEG-4 ASP re-quantises them for the prediction).

And the main weirdness is IDCT. While the older standards give just ideal transform formula, multiplying by matrix is slow and thus most implementations use some (usually fixed-point integer) approximation that also exploits internal symmetry for faster calculation (and hence one of the main problems with various H.263 and DivX-based codecs: if you don’t use the exactly the same transform implementation as the reference you’ll get artefacts because those small differences will accumulate). Actually ITU H.263 Annex W specifies bit-exact transform but nobody cares by this point. And Vivo Video has a different approach altogether: it generates a set of matrices for each coefficient and thus instead of performing IDCT directly it simply sums one or two matrices for each non-zero coefficient (one matrix is for coefficient value modulo 32, another one is for coefficient value which is multiple of 32). Of course it takes account for it being too coarse by multiplying matrices by 64 before converting to integers (and so the resulting block should be scaled down by 64 as well).

In either case it seems to work good enough so I’ve finally enabled nihav-vivo in the list of default crates and can finally forget about it as did the rest of the world.

NihAV: frame reordering

Friday, December 18th, 2020

Since I have nothing better to do I’d like to talk about how NihAV handles output frames.

As you might remember I decided to make decoders output frames on synchronous basis, i.e. if a frame comes to the decoder it should be decoded and output and in case when the codec support B-frames a reordering might happen later in a special frame reorderer. And the reorderer for the concrete decoder was selected based on codec capabilities (if you don’t have frame reordering in format then don’t do it).

Previously I had just two of them, NoReorderer (it should be obvious for which cases it is intended) and IPBReorderer for codecs with I/P/B-frames. The latter simply holds last seen reference frame (I- or P-frame) and outputs B-frames until the next reference frame comes. This worked as expected until I decided to implement H.264 decoder and hit the famous B-pyramid (i.e. when B-frames serve as a reference for another B-frames or even P-frames). To illustrate that imagine an input sequence of frames I0 P4 B2 B1 B3 which should be output as I0 B1 B2 B3 P4. The approach from IPBReorderer would output it as I0 B2 B1 B3 P4 which is not quite correct. So I had to add so-called ComplexReorderer which keeps an array of frames sorted by display timestamp and marks the frames up to a reference I- or P-frame available for output when the next reference frame comes. Here’s a step-by-step example:

  • I0 comes and is stored in the queue;
  • P4 comes and is stored in the queue, I0 is marked as being ready for output;
  • B2 comes and is stored in the queue right before P4;
  • B1 comes and is stored in the queue right before B2 so the queue now is B1 B2 P4;
  • B3 comes and is stored in the queue between B2 and P4;
  • then a next reference frame should come and we should store it and mark B1 B2 B3 P4 ready for output.

Of course one can argue that this waits for more than needed and we should be able to output B1 and B2 even before B3 arrives (or even better we can output B1 immediately as it appears). That is true but it is rather hard to do in the general case. Real-world DTS values depend on container timebase so how do you know there are no additional frames in sequence 0 1000 333 667 (plus the decoder can be told to stop outputting unreferenced frames). Relying on frame IDs generated by the decoder? H.264 has three different modes of generating picture IDs with one of them assigning even numbers to frames (and odd numbers to the second frame field if those are present). While it can be resolved, that will complicate the code for no good reason. So as usual I picked the simplest working solution trading theoretically lower latency for clarity and simplicity.

NihAV: optimisation potential

Sunday, December 13th, 2020

Today I can say what I’ve wasted about two months on: it was H.264 decoder. For now it’s the only entry in nihav-itu crate but I might add G.7xx decoders there or even the standard H.263 decoder in addition to all those decoders based on it.

Performance-wise it is not very good, about 2.5-3x times slower than libavcodec one without SIMD optimisations on random BaidUTube 720p videos but I’ve not tried to make it the fastest one and prefer clarity over micro-optimisations. But this still has a lot of optimisation potential as the title says. I suspect that even simply making motion interpolation functions work on constant-size blocks would make it significantly faster let alone adding SIMD. In either case it is fast enough to decode 720p in 2x realtime on my laptop so if I ever finish a proper video player I can use it to watch content beside game cutscenes and few exotic files.

As for the features it’s limited but it should be able to play the conventional files just fine plus some limited subset of High profile (just 8-bit 4:2:0 YUV without custom scaling lists). A lot of features that I don’t care about were ignored (proper loop filtering across the slice edges—nope, weighted prediction—maybe later, high-bitdepth or different chroma subsampling format support—quite unlikely, interlaced formats—no in principle).

While developing that decoder I also got better knowledge of H.264 internals for which I’m not that grateful but that’s to be expected from a codec designed by a committee with features being added to it afterwards.

In either case hopefully I’ll not be that bored to do optimisations unless I have to, so the potential will remain the potential and I’ll do some more interesting stuff instead. And there’s always Settlers II as the ultimate time consumer 😉

NihAV: audio player done

Wednesday, October 7th, 2020

As I wrote in my previous post, I had functioning audio player nearing completion. And now I’ve finally added all features I wanted to add and can call it done.

While previously I mostly ranted on the bloat introduced by the components authors, here I’d like to describe the design and the reasoning behind it.
(more…)

NihAV: towards an audio player

Sunday, October 4th, 2020

So after weeks of doing nothing and looking at lossless audio codecs (in no particular order) I’m going back to developing NihAV and more particularly an audio player.
(more…)

Revisiting lossless codecs…

Sunday, September 6th, 2020

I’ve decided to add a couple of lossless audio formats in a preparation for a long-term goal of having a NihAV-based player (the debug tool nihav-player that I currently have can’t really count for one especially considering how it does not play pure audio files and tends to deadlock in SDL audio thread).

So I’ve added nihav-llaudio crate with four most common formats for music I have, namely FLAC, Monkey’s Audio, TTA and WavPack. And I guess it’s time to revisit my opinion about various lossless audio formats now that I’ve (re)implemented support for some of them (I tried to summarise my views about them almost ten years ago). Let’s see what has changed since then:

  • I had a closer look at MPEG-4 ALS and it turned out to be rather interesting (and probably the only lossless audio codec with P-frames) but it also has somewhat insane options (like maximum prediction order of 1023 for LPC; or coding the whole file with just one I-frame and the rest being P-frames so no seeking is possible) and RLSLMS being broken (the reference decoder can’t decode the official reference samples) and it got no popularity at all;
  • TTA turned out to be very simple with a baffling rationale

    The sample count in a TTA1 frame is a multiple to 576 (sound buffer granule). Based on this, the “frame time” is defined as a constant 1.04489795918367346939. Thus, the sample count in a regular TTA1 frame determined as: regular TTA1 frame length = frame time * sample rate.

    I’m no mathematician so this does not form a coherent logical chain for me, I’d use something like “frame length in samples is sample rate rounded up to multiple of 576” instead of “sample rate multiplied by 256/245”. The main irritating point is that last frame contains less samples and you need to signal that it’s last frame (or merely check if you have enough bits left to decode a full frame after you decoded enough samples for the last frame). Oh, and TTA2 seems to be still in development.

  • And speaking about codecs in development, I don’t see new lossless audio codecs appearing after 2010. Either I got too old and don’t spot them or the interest has finally faded out. This might be because most people don’t buy music any more but rather rent it in some online store or use some streaming service. And those who still do probably use one of the old established codecs like FLAC.
  • And since I’ve mentioned it, my opinion on it has not changed and only got a bit more refined not that I have a decoder for it as well. Previously I thought FLAC is a simple format with a bad bitstream format that makes seeking hard. Now I know that FLAC is a simple format (fixed predictor or LPC up to order 32 and fixed Rice codes; the only thing that improves compression is splitting residues into partitions where optimal k for coding them can be selected) with horribly designed bitstream format.

    Normally lossless audio formats either store offsets for each frame or have an easily recognizable header, but FLAC is different. It’s obvious that the author was inspired by MPEG audio header design but those actually had frame sizes coded. Here in order to find where the frame ends you need either to decode it or calculate CRC for the data you read (and in the likely case of false positives also check that the data is followed by a valid header). One could argues that there’s often a seek table in FLAC file but for e.g. in luckynight.flac those entries are for multiples of ten seconds positions, making seeking to a more precise position a task of skipping frames (which is fun—see above).

  • WavPack is still the best designed format in my opinion though it would be nicer to have some initial header with various metadata instead of having it stored in the first block. Other than that still no objections.
  • And it turns out there’s lossless AAC compression that employs wavelet transform before LPC (it’s Chinese AAC though so who cares).

I remember reading somewhere (on Hydrogenaudio most likely) a brief story about development of several popular lossless audio codecs (even told by the author of one but I might be wrong). Essentially it’s not a NIH syndrome but very close: somebody develops a format, another guy finds a minor flaw the original developer refuses to address (my memory is hazy but I think there were such things mentioned as no plugin for some player or not supporting some tags) and develops another format. The amount of formats that came to existence because somebody wanted to create a format and could not keep it to himself is pretty large too.

But those days seem to be over and maybe I’ll reverse engineer some of those old codecs for documenting reasons as there’s very little risk that somebody would pick them up and make widespread now. Alternatively I can rant on newer formats sucking as well. Though why wait, let’s do it now:

  • AAC sucks because of the countless extensions and attempts to bundle various coding approaches under the same name (fun fact – “xHE-AAC” is actually pronounced as “MPEG-D you-suck”);
  • AV1 sucks because of the organisational structure and their decisions during (and after) the design stage;
  • AV2 is not here yet but it sucks for the same reason;
  • BlueTooth audio codecs suck in various ways (except SBC, it’s okay for the purpose), especially because of marketing them as high-definition and robust while in reality they rarely are;
  • Chinese codecs suck for being rip-offs of better-known codecs. It’s especially gross that one of them got standardised as IEEE 1857.2 AAC;
  • H.264 sucks because of countless extensions;
  • H.265 inherited some from H.264 and added the licensing situation on top of that;
  • MPEG-5 EVC sucks because it’s a Frankenstein monster constructed from bits from H.263-H.265;
  • Opus sucks for being designed for streaming case and used elsewhere;
  • Vector-based codecs suck because current tools are still not good enough to autovectorise complex shapes and recognize gradients.

Now back to doing nothing.

NihAV relicensed code registry

Monday, August 17th, 2020

Since I’ve got the second request for a decoder relicensing I’ve decided to keep an open list of the project that requested relicensing. This way it may satisfy somebody’s curiosity about which parts of NihAV piqued some interest and also keep a proof for a project that I granted them a new license for the code.

The page is right here.

NihAV: released!

Monday, July 27th, 2020

NihAV was a fine joke that had been running for far too long. But today, on no particulate date at all, I release it for public to ignore or to briefly look and forget immediately. Some decoders (Bink2, ClearVideo and Vivo 2) are still far from perfect, some features have simple or sketchy implementations, but despite all of that here it is.

The official website is here, source code is here.

Many thanks to people from former Libav project for hosting.