NihAV — Concept and Principles

Looks like I’m going to repeat the same things over and over in every NihAV-related post so I’d better sum them up and whenif people ask why some decision was made like that I can point them here.

So, let’s start with what NihAV IS. NihAV is the project started by me and me alone with the following goals:

  • design multimedia framework from the ground in the way I see fit (hence the NIH in the name);
  • do that without any burden of legacy (should be obvious why);
  • implement real working code to both test the concepts and to keep me interested in continuing the project (it gets boring pretty quickly when you design, write code and it still does not do anything visible at all);
  • ignore bullshit cases like interlaced H.264 (the project is written by me and for myself and I’ll do fine without it, thank you very much);
  • let me understand Rust better (it’s not that important but a nice bonus nevertheless).

Now what NihAV is NOT and is NOT going to be:

  • a full-stack multimedia framework (i.e. which lacks only handling user input and audio/video output to become a media player too, more about it below);
  • transcoder for all your needs (first, I hardly care about my own needs; second, transcoder belongs elsewhere);
  • supporting things just because they’re standard (you can leave your broadcasting shit to yourself, including but not limited to MXF, interlacing and private streams in MPEG-TS);
  • designed with the most convenient way of usage for the end user (e.g. in frame management I already output dummy frames that merely signal there was no change from the previous frame; also frame reordering will be implemented outside decoders);
  • have other FFeatures just because some other project has them;
  • depend on many other crates (that’s the way of NIH!);
  • have hacks to support some very special cases (I’m not going to be paid for e.g. fixing AVI demuxer to support some file produced by a broken AVI writer anyway).

What it might become is a foundation for higher level multimedia data management which in turn can be either a library for building transcoder/player or just used directly in such tools. IMO libav* has suffered exactly from the features that should be kept in transcoder creeping into the libraries, the whole libavdevice is an example of that. Obviously it takes some burden off library users (including transcoding tool developers) but IMO library should be rather finished piece with clearly defined functionality, not a collection of code snippets developers decided to reuse or share with the world. Just build another layer (not wrapper, functional layer!) on top of it.

For similar reasons I’m not going to hide serious functionality in utility code or duplicate it in codecs. In NihAV frames will be output in the same order as received and reordering for the display will be done in specific frame reorderer (if needed), same for filling missing timestamps; dummy frame that tells just to repeat the previous frame is used there in GDV decoder already:

    let mut frm = NAFrame::new_from_pkt(pkt, self.info.clone(), NABufferType::None);
    frm.set_keyframe(false);
    frm.set_frame_type(FrameType::Skip);

Some things do not belong to NihAV because they are either too low-level (like protocols) or too high-level (subtitles rendering, stream handling for e.g. transcoding or playback, playlist support). Some of them deserve to be made into separate library(ies) later, others should be implemented by the end user. Again, IMO libav* suffers from exactly this mix of low- and medium-level stuff that feels too low-level and not low-level enough at the same time (just look how much code those ffmpeg or avconv tools have). Same goes for hardware-accelerated decoding where the library should just demux frame data and parse its headers, the rest is up to hwaccel chain in the end application, but instead lazy users prefer libavcodec to try all possible hwaccels on the frame and fall back to multithreaded software decoding automatically if required. And preferably all other processing in e.g. libavfilter should be done using custom hwaccel format too. Since I’m all for this approach (…NOT), NihAV will recognize that the frame uses some hwaccel format and that’s all. It’s up to the upper layer to build custom processing chain.

I hope the domain for NihAV is clear: it will take ByteIO input, demux data using it (packets or elementary stream chunks—if you want them in packet format then use a parser), optionally fill timestamp information, decode frames, reorder them in display order if requested, similar approach for writing data. Anything else will belong to other crates (and they might appear in the future too). But for now this is enough for me.

P.S. If I wanted to have multimedia player I’d write one that can take MP4/FLAC/WV for input and decode AAC/FLAC/WavPack plus feed H.264 to VAAPI. I know my hardware and my content, others can write their own players.

P.P.S. If you want multimedia framework written in Rust for wide userbase just wait until rust-av is ready.

One Response to “NihAV — Concept and Principles”

  1. Luca Barbato says:

    Thank you for the PPS 🙂

    Rust zero-cost-abstraction is sort of suitable to have the layering you mention.

    I already started to wrap cuvid/nvenc and the intel mediasdk so I can use it by exposing it as a normal Decoder and Encoder.

    Supporting opaque frames will be interesting but that’s a frame filtering concern.