NihAV: Progress Report

Since we all in Europe have to suffer from the lockdown until things get better I can’t travel for now and in result I have to spend free time in different ways (mind you, I still work despite it’s work from home so it’s not that much free time added). Nevertheless I’ve spent a significant part of that time working on NihAV and in result I have some progress to report.

First of all, I have finally improved H.263-based decoders from the level “it decodes bitstream correctly and outputs something recognizable” all the way up to “decodes with just minor artefacts”. That required a lot of debugging and even some reverse engineering.

Currently I have just three decoders based on H.263: RealVideo 1, RealVideo 2 and Intel 263. And all of those decoders have features not present in the other ones. RealVideo 1 has the wonderful OBMC (overlapped block motion compensation) where you need to know motion vectors for all four neighbours (yes, including the bottom one) before you can perform motion compensation, RealVideo 2 has proper standalone B-frames, and Intel263 has PB-frames where P-frame and B-frame macroblock data is stored interleaved. I mentioned before that in some aspects H.263 is worse than MPEG-4 ASP or H.264. H.263 has more annexes than any other video codec specification with almost every letter of Latin alphabet taken and those annexes change decoding process significantly. Inter block with four motion vectors instead of one? Annex F. Alternative intra block coding mode with coefficient prediction? Annex I. PB-frames? Annex G. PB-frames with B part not always being bidirectional? That’s Annex M. In-loop filter? Annex J. Different quantisation mode especially for chroma? Annex T.
At least it all works now good enough and I don’t intend to return to it any time soon.

Second, I’ve checked and fixed my VMD decoder. The problem was audio part. First, there’s annoyingly two variants of stereo audio compression. They both use the same prediction scheme but one stores predictors in the beginning of each block (IIRC that’s the newer scheme that I’ve seen being used only in Shivers 2) and another one just stores the deltas. And here’s the annoyance: many clips from Lighthouse store audio payload in blocks of 4409 bytes. Each delta is exactly one byte and there are two channels and there are only deltas in the block—do the maths yourself. In case you can’t, it’s 2204 samples for one channel and 2205 for another one. I had to buffer that leftover sample and add it back in the beginning of the next block.

And finally I’ve added some framework for future support of common compression formats. Currently I have implemented only deflate decompressor and it was tricky to do that properly. The format itself is simple and I’ve implemented a decoder in a couple of hours (including distractions to other activities) but making it work in desired conditions took more than a day. What I mean by that? The fact that while normally simple uncompress() with known input and output lengths would suffice, sometimes you need to decode it exactly as a stream with an unknown input and output sizes. In order to do that you need either to somehow juggle inputs/outputs maybe even via callbacks (further complicated by Rust ownership concept) or save the decompressor state in case of input or output buffer exhaustion and letting the caller to do that. I took the second approach obviously and implemented it as a state machine where if there’s not enough input bits or no space left in the output buffer I simply save the bit reader state (without the data pointer) and return an error code telling caller either to feed more data or to do something with the output data and keep decoding the same block. It’s probably very ineffective but it works fine even in case of single byte input and output buffers.
This means that now I can write a decoder for some screen codec without any external dependencies and if the need arises I might even write an encoder counterpart. I’ve written simple LZ77- and LZ78-based compressors before so hopefully I can implement something decent for an encoder even if not as good as zlib.

All in all, this was probably not the most pleasant work to do but it sure feels good having this done. Now I can move to something different like Vividas container and codecs family or improving nihav-tool or implementing a proper player or something else entirely. I don’t make plans because they fail so only the future will show what will be done.

7 Responses to “NihAV: Progress Report”

  1. Paul says:

    So you reimplemented zlib in rust from scratch?

  2. Kostya says:

    I’d not call reimplementing zlib because I did just inflate part (no compression) with different interface (no special public state for input and output buffer pointers) and I looked only at RFC 1951 and RFC 1952 (the latter for gzip format).

    But it should be good enough replacement for my purposes.

  3. DrMcCoy says:

    There’s at least one more VMD sound type, used in the Version 5 of the educational game series Addy/Adi.

    The graphics are slightly different too. ffmpeg doesn’t play it at all, my decoder in ScummVM partially (but no sound). ffmpeg also doesn’t seem to play the Addy/Adi 4.21 files I have here, while my decoder does; might be yours fails there as well.

    I can provide samples if you want them.

  4. Kostya says:

    Sounds like the difference between Sierra VMD and Coktel Vision VMD 😉 IIRC no Sierra game used coordinates inside VMD while e.g. Woodruff did. And there’s Urban Runner with its Indeo 3 in VMD as well.

    I’d like to revisit the games and those ADIv5 as well.

  5. DrMcCoy says:

    Yeah, Coktel stuffed a lot of weird stuff into VMD. Well, and their games in general. It was a very weird studio 🙂

  6. MoSal says:

    @Paul @Kostya

    btw, there is already a pure rust zlib implementation:
    https://github.com/Frommi/miniz_oxide/tree/master/miniz_oxide

    The popular flate2 crate uses this implementation as a default backend.

  7. Kostya says:

    I guess there’s more than one implementation in any non-toy programming language since it’s very easy to implement it (and moderately easy if you need not merely to decompress full input buffer to full output buffer). So I NIHed mine as the project demands.