NihAV: Boring Details

As I mentioned in the previous post, I’m polishing NihAV and improving some bits here and there. In this post I’d like to describe what has been done and what should be done (but not necessarily will be done).

For about a month I’ve been working on adding various functionality and functionality to support that functionality and resolving recursive dependency on that functionality as well. I wanted to have a better means to test decoder output so I tried to write a simple player. Which required implementing missing bits in nihav_core::scale and implementing sound conversion. And then implementing support for floating point numbers in nihav_core::io::byteio (at least Rust offers functionality to convert float into its integer representation and back). And then optimising scaler since it took two thirds of playback CPU. And also implementing seeking in demuxers (so that nihav-tool can start decoding from some position and skip uninteresting first frames).

Let’s start with the lest significant feature—seeking. I’ve introduced SeekIndex structure that can be filled by demuxer on opening format and later used to calculate seek position. I wonder if it’s worth to add automatic index building or not. At least I expect to have significantly less demuxers than decoders so it can be changed later with relative ease. Currently AVI and RM demuxers support seeking.

Now to testing. There are two different things: internal crate testing and manual testing with nihav-player. First, let’s talk about internal testing.

Rust allows modules to have separate test functionality so normally built crate won’t contain it but cargo test will build special version with tests enabled, run it and report which tests failed with the output they produce. Usually I abuse this functionality to debug decoder (since I don’t have to build full nihav-tool but rather just two-three crates required for decoder and its family). So previously test function just decoded payload and optionally output sequence of images or wave file. It’s good enough for manual checking or to ensure that it works but it’s not good enough for regression testing. So I’ve finally added MD5 hash support in testing function so now I can also compare decoded output to MD5 for the whole output or per-frame (I still need a better solution for floating point audio but it can wait). Also because I always operate either packed buffers or in native endianness and calculate MD5 directly on those buffers in the way defined by me it should produce the same hash on big-endian systems as well (it’s just I still remember the times when I forgot to add output colourspace conversion for high-bitdepth or RGBA output for FATE test in order to fix them on PowerPC).

And finally the hairiest part—nihav-player. It’s very primitive player that can only play audio and video hopefully in sync and quit when ‘q’ is pressed. And yet it both required a lot of additional code to be written and reduced the complexity of testing—for instance it demonstrates chroma artefacts (or simply swapped chroma planes) and playing a file takes less time than waiting for decoding to finish and reviewing frames one by one.

Here’s a list of features I had to add or improve just for player:

  • scaler performance—I had to change YUV to RGB conversion from the straight floating-point matrix multiplication to the conventional four tables. Also I’ve introduced special cases for nearest-neighbour scaling to make it work faster (yes, I still don’t have better scaler and it does not bother me);
  • sound format conversion (resampling is still pending)—this converts between various sample formats and somewhat channel layouts (up-/down-mixing works only for anything to mono and 5.1 to stereo). Internally it works by iterating over a set of samples for each channel using intermediate 32-bit integer or floating point format. That required adding traits for sample format conversions like 16-bit integer into float or float into 8-bit unsigned int plus special readers that produce set of samples in given format no matter what input format is. Maybe I should use the same design for scaler as well;
  • make all decoders thread-safe (so I can run decoders in different threads). That required non-trivial changes only in Indeo 4/5 decoder because of its custom frame management;
  • finally adding frame reorderers. Unlike other frameworks my decoders work on frame in-frame out principle with no reordering. Test decoding outputs frames with PTS as part of their name so lazy sort worked fine with no need for reordering. In result I had to add it only for the player.

Another thing I had to improve for my player is sdl crate. Yes, my main development system is too old for SDL2 (with or without sdl2-build crate) so I stuck to SDL1. The Rust wrapper lacks support for YUV overlays (it was trivial to add) and sane audio handling (i.e. audio callback is just a function pointer which is hardly usable; I’ve changed it to accept some audio callback track in style of SDL2 wrapper).

The player itself took a lot of fiddling because I barely remember how I played with SDL more than ten years ago and I’ve not played with multi-threading much. In either case spawning threads with moving variables and channels for passing data between threads was a bit bumpy but it works. In result I have main thread that demuxes data, passes it to separate audio and video decoding threads, and then receives decoded video surfaces or overlays from video thread and displays them when the time comes (or discards if it’s too late—I should add a control for decoder to skip e.g. B-frames during decoding). Audio thread decodes audio and puts it into FIFO which gets emptied by audio callback when SDL calls it. Video thread decodes video, converts and/or scales into RGB32 or YUV420 and sends results back to main thread in either SDL RGB surface or YUV overlay (to my surprise they are not mutually exclusive).

In either case that’s just a first attempt to build proof of a concept player just for looking how decoders work and eventually I should write a proper one with better syncing, interactivity and other bells and whistles. The current one does its task good so I could watch a lot of samples for various decoders and see what should be fixed. I’ve fixed bugs in Indeo 3-5 decoders already plus swapped chroma planes in other decoders but H.263-based decoders still suck a lot and VP3-7 decoders are even worse. So there’s still a lot of fixing for me left.

At least after all those fixes and rewriting player I should have something substantial to show the world.

Leave a Reply