Okay, I’ve finally done all the low-hanging fruits and now the progress is blocked by the need of reverse engineering (namely, Bink 2, Discworld Noir BMV and TrueMotion 2X) so I don’t expect anything major happening to the project in near time.
Meanwhile it’s a good opportunity to talk about how NihAV is (mis)designed and how it should work in principle.
In principle there are three scenarios: playing the file, encoding raw input and transcoding. And first two can be considered to be transcoding with known raw inputs or outputs plus specific requirements to them.
So, what needs to be done if we want to transcode something into something else:
- detect the input format and open it;
- select input streams and open decoders for them;
- create encoders for desired output;
- negotiate formats between inputs and outputs (minding the possible other filters inside the pipeline);
- read input, process, synchronise if required, feed to output.
If this sounds obvious—that’s because it is. If it sounds trivial then you miss wonderful complications that may occur at every stage.
Detecting input format may be not trivial because not all containers have proper markers and you can either rely on extension or try every possible demuxer to see if it works. NihAV
allows flexibility in this choice: there is a standalone nihav_core::detect
module that allows to probe file for known container format and it will return a most probable container name and score (either extension match or marker check match). This way you can find out format even if you don’t have demuxer registered for it. And if you’re desperate you can try opening that file with all registered demuxers to see which one fit (my tools won’t do that probably but it’s easy to do even with current API).
Then you have input streams. In some cases you have proper streams with packets, in other cases you have raw stream which you have to split into packets yourself. One solution would be to mark those streams as requiring parsing and automatically insert a parser as well. NihAV
design (not yet implemented since I don’t need it yet) tells to report elementary stream as its own type and return data that should be parsed elsewhere. The same applies for files that are raw elementary streams. If you want to be able to decode /dev/random
as MPEG Audio Layer I you should force the proper parser and decoder yourself.
Actual decoding should be done in traditional way: a packet goes it, a frame comes out (in the exceptional cases it should report DecoderError::TryAgain
but it’s really for exceptional cases; previously I thought about inserting queues of packets/frames between processing units but finally decided against it). Decoders should do no reordering. If you wonder how the caller should be able to reconstruct the proper order, it’s simple enough. There is a global nihav_core::registry
(as well as the format detection module it may be moved into separate crate in the future) that contains information about all known codecs. And knowing the codec information you can create a proper frame reshuffler (none for many formats, the one that reorders using frame type for I/P/B-frame codec and something more complex for codecs with pyramid referencing scheme like H.264 or H.EVC).
While I don’t have this functionality yet it’s not that bad since currently I simply output frames into single images and sleepsort works fine (i.e. if frame00004.ppm is created before frame000002.ppm it’s no big deal for image viewer).
Encoders are something I have not considered yet.
Format negotiation is currently implemented only for video frames in nihav_core::scale
module. It works as I described it three years ago: converter adds necessary stages for frame unpacking, rescaling, format conversion and packing. Of course if some stage is not needed it’s not added. The only annoying thing is that internally I use typed frames, so frames with an array of bytes, an array of 16-bit ints and 32-bit ints should be handled separately. At least that means I internally always work with native-endian data and the actual converting is done only when the data is read or written. And of course this is mostly just proof of concept than fully-working solution but it can convert e.g. RGB565 into YUV410 while rescaling it too.
And it’s worth mentioning that another fun issue is timestamps and how to interpolate them when they are missing. Since there’s no good universal solution I’d not try to invent something complex either and I’ll probably stick to the laziest approach.
Overall, I intend to create nihav-pipeline
crate (that’s a tentative name) for handling some of the middle-level stuff like frame reordering and timestamp related issues that can be served as a base for both transcoder (definitely not coming this year) and player (this one might actually appear this year). But for now nihav-tool
that decodes files into sequences of images and wave files is enough for experiments. I care mostly about decoding after all.