Disclaimer: this should’ve been written by a certain Italian (and discussed during VDD too) but since he is even lazier than me (and stopped on writing only this), I ended writing this.
One of the main concerns in framework design is how data will be passed from sources to destinations. There are two ways that are mistakenly called synchronous and asynchronous, the main difference being lack of CLK signal how the producing function passes data further — on return when it’s done whatever it wanted to do or by passing control to some external procedure immediately when data is available.
In the worst case of synchronous data passing you have a fixed pipeline: one unit of input goes in, one unit of output is expected to be produced (and maybe one more on final flush), repeat for the next stage. In the worst case of asynchronous data passing you have one thread per stage waiting for another thread to signal that it has some data ready to process. Or in single-threaded mode you simply have nested callbacks, again one per stage, calling the next callback in a chain when it has something to deliver.
Anyway, how should this all apply to NihAV?
In the simplest case we have one chain:
[Demuxer] --> (optional stream splitter) --> Packets --> [Decoder] --> Frames --> [output]
In real world scenario all complexity starts at the stage marked [output]
. There you usually feed frames to some filter chain, then to encoder and combine several streams to push all that stuff to some muxer(s).
As for data passing, you can have it in several modes: main processing loop, synchronous processing graph, asynchronous processing graph or an intricate maze of callbacks all alike.
With the main processing loop you have something like this:
while (has_input()) {
pkt = get_input();
send_output(pkt);
while(has_output()) {
write_output();
}
}
while(has_output()) {
write_output();
}
It’s ideal if you like micromanagement and want to save on memory but with many stages it might get rather hairy.
The synchronous processing graph (if you think it’s not called so look at the project name again) adds queues and operates on stages:
while(can_process(graph)) {
for (i = 0; i < num_stages; i++) { graph->stages[i]->process(graph->inqueues[i], graph->outqueues[i]);
}
}
In this situation you have not a single [input] -> [output]
connection but rather [input] -> (queue) -> [output]
, every stage is connected by some queues with another stage and can consume/produce several (or none) elements in any of the queues.
The asynchronous processing graph relies on the output stages pulling data from all previous stages or input stages pushing data to the following stages.
int filter_process(Filter *f, void *data)
{
do_something(f->context, data);
f->next->filter(f->next, data);
}
And callback is:
int demux(void (*consumer)(...)){
while (!eof) {
read_something();
if (has_packet)
consumer(ctx->consumer_id, packet);
}
return 0;
}
Callbacks are nice but they move a lot of burden into callback writer and they are hard to get right especially if you do something nontrivial.
So far I’m inclined to have the second approach. E.g. Bink demuxer has packets for several streams stored together in one block — so demuxer will create them all and put into its output queue. Hardware accelerated codec may send several frames for decoding at once — so it will read as many as possible or needed from the input queue and add all decoded frames into output queue when they’re done. Also all shitty tasks like proper synchronisation can be made into filters too.
Of course that will create several new problems especially with resource management but they should be solvable.
For NihAV implementation that means adding NAQueue
and several new datatypes (packet, frame, whatever) plus a mechanism in NAClass
to recognize them because you would not want a wrong thing going to the wrong pipeline (e.g. decoder expects NAPacket
and gets NAFrame
instead).