NihAV: Conceptually Done!

I’m happy to announce that NihAV has finally taken more or less complete form. Sure there are some concepts I wanted to play with (like raw streams handling) but I had no need for them so far so it can wait until much much later. But all major features required to build a transcoder are there as well as working transcoder itself.

As I wrote in the previous post I wanted to play with vector quantisation so first I implemented image palettisation but since that was not enough I implemented two encoders using vector quantisation: 15-bit MS Video 1 and Cinepak. I have no doubts that Tomas Härdin has written a much better encoder but why should that stop me from NIHing? Of course such encoder is not very useful by itself (and it was useless to begin with) so I needed a muxer to represent encoder output in some form. And then simply fiddling with parameters and recompiling became boring so I finally introduced generic options and in order to use those options without recompiling the binary every time I had to write a transcoder as well. But that means that now I can use NihAV to recode media into something else even if it’s just two crappy video encoders, MS ADPCM and PCM encoder with the large variety of supported output containers (AVI and WAV!). I called it conceptually done because all the essential concepts are there, not because there’s nothing left to do.

Now about video encoders. I’ll describe the NihAV design and how it works on a separate page, for now I just mention that while decoders are working on “frame in-picture/audio out” principle, encoders accept single picture or audio buffer for encoding and then may output a series of encoded packets. Why such asymmetry in design? Because decoders are expected to produce single output for single input (and frame reordering is handled externally) while most encoders are expected to have at least a single audio frame or couple of pictures of lookahead to make decisions about coding of a current input. For modern video codecs it may be a decision what frame type to assign or where to start a new scene, for audio codecs like AAC you may need to change current frame type if the following frame type has transients and previous frame type didn’t have them.

Anyway, back to the technical details about the encoders. MS Video 1 operates on 4×4 blocks that can be coded as skipped, filled with single colour, filled with two colours in a pattern, or split into 2×2 sub-blocks each filled with its own two colours in a pattern. Sounds perfect for median cut. Cinepak is much more complex. It splits frame into several strips and each strip is also split into 4×4 blocks that may be coded as skipped, single 2×2 YUV codeword (2×2 Y block and single U and V values) scaled twice or four YUV codewords from different codebook. Essentially for a good encoding you need to determine how to partition frame into strips optimally, split blocks into single and four-vector ones and find optimal codebooks for them separately. Since I wanted to write a working encoder mostly to check whether vector quantisation is working, I simply have fixed amount of strips and add every block as a candidate for both coding schemes without a following refining steps.

Here are some numbers if you really care about those. Input is laser05.avi (320×240 Indeo2 file with 196 video frames from the standard samples place). Encoding with MS Video 1 encoder takes about 4 seconds . Encoding Cinepak with median cut takes six seconds. Encoding Cinepak with ELBG and randomly-generated codebooks takes 36 seconds and result looks bad (but recognizable). Encoding Cinepak with ELBG that takes codebooks produced with median cut as the initial ones takes 68 seconds but the quality is higher than merely median cut and the output file is slightly smaller too.


Now with all of this done I should probably fix the knowingly bad decoders (RV6 and Bink2), add whatever missing decoders and features I see fit and start documenting it all. I have strong doubts about VDD this year but maybe I’ll be able to present my stuff at FOSDEM 2021.

4 Responses to “NihAV: Conceptually Done!”

  1. Paul says:

    What about subtitles?

  2. Kostya says:

    There’s such stream type and if the need arises I can add some subtitle decoders but probably it’s better to leave them for the end application to care about since they’re used to modify video frame instead of being output independently like e.g. audio.

  3. Marcus says:

    4 seconds to encode just 15 megapixels worth of information?

    holy fuck rust is slow.

  4. Kostya says:

    Why do you think it has anything to do with Rust itself and not my straightforward implementation? My decoders run at comparable speed with their C counterparts after all.

    The main factor here is that vector quantisation is slow by itself (especially when you split the whole frame into 6-byte elements and search for a global codebook) and I’ve used no tricks to speed it up. My goal was to write a simple working yet non-trivial encoder and I achieved it. Writing an optimised anything is not a priority for NihAV and never was.

Leave a Reply