Since I have nothing better to do I’d like to talk about how NihAV
handles output frames.
As you might remember I decided to make decoders output frames on synchronous basis, i.e. if a frame comes to the decoder it should be decoded and output and in case when the codec support B-frames a reordering might happen later in a special frame reorderer. And the reorderer for the concrete decoder was selected based on codec capabilities (if you don’t have frame reordering in format then don’t do it).
Previously I had just two of them, NoReorderer
(it should be obvious for which cases it is intended) and IPBReorderer
for codecs with I/P/B-frames. The latter simply holds last seen reference frame (I- or P-frame) and outputs B-frames until the next reference frame comes. This worked as expected until I decided to implement H.264 decoder and hit the famous B-pyramid (i.e. when B-frames serve as a reference for another B-frames or even P-frames). To illustrate that imagine an input sequence of frames I0 P4 B2 B1 B3
which should be output as I0 B1 B2 B3 P4
. The approach from IPBReorderer
would output it as I0 B2 B1 B3 P4
which is not quite correct. So I had to add so-called ComplexReorderer
which keeps an array of frames sorted by display timestamp and marks the frames up to a reference I- or P-frame available for output when the next reference frame comes. Here’s a step-by-step example:
I0
comes and is stored in the queue;P4
comes and is stored in the queue,I0
is marked as being ready for output;B2
comes and is stored in the queue right beforeP4
;B1
comes and is stored in the queue right beforeB2
so the queue now isB1 B2 P4
;B3
comes and is stored in the queue betweenB2
andP4
;- then a next reference frame should come and we should store it and mark
B1 B2 B3 P4
ready for output.
Of course one can argue that this waits for more than needed and we should be able to output B1
and B2
even before B3
arrives (or even better we can output B1
immediately as it appears). That is true but it is rather hard to do in the general case. Real-world DTS values depend on container timebase so how do you know there are no additional frames in sequence 0 1000 333 667
(plus the decoder can be told to stop outputting unreferenced frames). Relying on frame IDs generated by the decoder? H.264 has three different modes of generating picture IDs with one of them assigning even numbers to frames (and odd numbers to the second frame field if those are present). While it can be resolved, that will complicate the code for no good reason. So as usual I picked the simplest working solution trading theoretically lower latency for clarity and simplicity.