Since I had nothing better to do for the rest of this year (I expect the next year to begin in the same fashion) I decided to take a look at the problem when some files were decoded with inter-frames becoming distorted like there’s some sharpening filter constantly applied. And what do you know, there’s some smoothing involved in certain cases.
(more…)
Archive for December, 2020
ClearVideo briefly revisited
Thursday, December 31st, 2020A quick look on Rududu
Sunday, December 27th, 2020Since I had nothing better to do I decided to look at Rududu codec. It is one of old more exotic codecs that nobody remembers.
I did not want to look that deep into its details (hence it’s just a quick look) so here are the principles it seems to employ:
- it seems to employ some integer approximation of wavelet transform (instead of e.g. LeGall 5/3 transform employed by lossless JPEG-2000);
- it probably has intra- and interframes but it does not employ motion compensation, just coefficients updating;
- DWT coefficients are quantised (and common bias is removed) with scale and bias calculated for the whole frame;
- coefficients are coded using quadtree (i.e. some parts of the bands can be left uncoded in addition to skipping the whole DWT subbands);
- and finally, data is coded using adaptive models for absolute values and bits for both signs and “region coded” flags and the probabilities from these models are fed to the range coder.
So while this codec is nothing outstanding it’s still a nice change from the mainstream video coding approach defined by ITU H.26x codecs.
Vivo2 revisited
Tuesday, December 22nd, 2020Since I have nothing better to do (after a quick glance at H.264 decoder—yup, nothing) I decided to look at Vivo 2 again to see if I can improve it from being “decoding and somewhat recognizable” to “mostly okay” stage.
To put a long story short, Vivo 2 turned out to be an unholy mix of H.263 and MPEG-4 ASP. On one hoof you have H.263 codec structure, H.263 codebooks and even the unique feature of H.263 called PB-frames. On the other hoof you have coefficient quantisation like in MPEG-4 ASP and coefficient prediction done on unquantised coefficients (H.263 performs DC/AC prediction on already dequantised coefficients while MPEG-4 ASP re-quantises them for the prediction).
And the main weirdness is IDCT. While the older standards give just ideal transform formula, multiplying by matrix is slow and thus most implementations use some (usually fixed-point integer) approximation that also exploits internal symmetry for faster calculation (and hence one of the main problems with various H.263 and DivX-based codecs: if you don’t use the exactly the same transform implementation as the reference you’ll get artefacts because those small differences will accumulate). Actually ITU H.263 Annex W specifies bit-exact transform but nobody cares by this point. And Vivo Video has a different approach altogether: it generates a set of matrices for each coefficient and thus instead of performing IDCT directly it simply sums one or two matrices for each non-zero coefficient (one matrix is for coefficient value modulo 32, another one is for coefficient value which is multiple of 32). Of course it takes account for it being too coarse by multiplying matrices by 64 before converting to integers (and so the resulting block should be scaled down by 64 as well).
In either case it seems to work good enough so I’ve finally enabled nihav-vivo
in the list of default crates and can finally forget about it as did the rest of the world.
NihAV: frame reordering
Friday, December 18th, 2020Since I have nothing better to do I’d like to talk about how NihAV
handles output frames.
As you might remember I decided to make decoders output frames on synchronous basis, i.e. if a frame comes to the decoder it should be decoded and output and in case when the codec support B-frames a reordering might happen later in a special frame reorderer. And the reorderer for the concrete decoder was selected based on codec capabilities (if you don’t have frame reordering in format then don’t do it).
Previously I had just two of them, NoReorderer
(it should be obvious for which cases it is intended) and IPBReorderer
for codecs with I/P/B-frames. The latter simply holds last seen reference frame (I- or P-frame) and outputs B-frames until the next reference frame comes. This worked as expected until I decided to implement H.264 decoder and hit the famous B-pyramid (i.e. when B-frames serve as a reference for another B-frames or even P-frames). To illustrate that imagine an input sequence of frames I0 P4 B2 B1 B3
which should be output as I0 B1 B2 B3 P4
. The approach from IPBReorderer
would output it as I0 B2 B1 B3 P4
which is not quite correct. So I had to add so-called ComplexReorderer
which keeps an array of frames sorted by display timestamp and marks the frames up to a reference I- or P-frame available for output when the next reference frame comes. Here’s a step-by-step example:
I0
comes and is stored in the queue;P4
comes and is stored in the queue,I0
is marked as being ready for output;B2
comes and is stored in the queue right beforeP4
;B1
comes and is stored in the queue right beforeB2
so the queue now isB1 B2 P4
;B3
comes and is stored in the queue betweenB2
andP4
;- then a next reference frame should come and we should store it and mark
B1 B2 B3 P4
ready for output.
Of course one can argue that this waits for more than needed and we should be able to output B1
and B2
even before B3
arrives (or even better we can output B1
immediately as it appears). That is true but it is rather hard to do in the general case. Real-world DTS values depend on container timebase so how do you know there are no additional frames in sequence 0 1000 333 667
(plus the decoder can be told to stop outputting unreferenced frames). Relying on frame IDs generated by the decoder? H.264 has three different modes of generating picture IDs with one of them assigning even numbers to frames (and odd numbers to the second frame field if those are present). While it can be resolved, that will complicate the code for no good reason. So as usual I picked the simplest working solution trading theoretically lower latency for clarity and simplicity.
NihAV: optimisation potential
Sunday, December 13th, 2020Today I can say what I’ve wasted about two months on: it was H.264 decoder. For now it’s the only entry in nihav-itu
crate but I might add G.7xx decoders there or even the standard H.263 decoder in addition to all those decoders based on it.
Performance-wise it is not very good, about 2.5-3x times slower than libavcodec
one without SIMD optimisations on random BaidUTube 720p videos but I’ve not tried to make it the fastest one and prefer clarity over micro-optimisations. But this still has a lot of optimisation potential as the title says. I suspect that even simply making motion interpolation functions work on constant-size blocks would make it significantly faster let alone adding SIMD. In either case it is fast enough to decode 720p in 2x realtime on my laptop so if I ever finish a proper video player I can use it to watch content beside game cutscenes and few exotic files.
As for the features it’s limited but it should be able to play the conventional files just fine plus some limited subset of High profile (just 8-bit 4:2:0 YUV without custom scaling lists). A lot of features that I don’t care about were ignored (proper loop filtering across the slice edges—nope, weighted prediction—maybe later, high-bitdepth or different chroma subsampling format support—quite unlikely, interlaced formats—no in principle).
While developing that decoder I also got better knowledge of H.264 internals for which I’m not that grateful but that’s to be expected from a codec designed by a committee with features being added to it afterwards.
In either case hopefully I’ll not be that bored to do optimisations unless I have to, so the potential will remain the potential and I’ll do some more interesting stuff instead. And there’s always Settlers II as the ultimate time consumer 😉