Archive for the ‘NihAV’ Category

AVC support in NihAV: semi-done

Saturday, September 14th, 2019

I’ve wasted enough time on AVC decoder for On2 family so while it’s not working properly for those special cases I’m moving to VP7 regardless.

For those who don’t know (or forgot; or never had a reason to care) On2 AAC is AAC-LC rip-off with some creative reconstruction modes added to the usual long/short windows. I’ve failed to understand how it works before and I fail to understand how it works still. But at least some details are a bit clearer now that I’ve analysed the whole codec from scratch with less guesswork.

The codec has three IDs that it recognizes: 0x500, 0x501 and 0x1234. First two are different only in the aspect that one handles singular packets and another one handles several packets glued together prefixed with size. The last ID is simply recognized but it does not have any special handling.

The tricky part is some special modes that do some heavy processing of data. For most modes you invoke IMDCT and that’s all, here you do some QMF-like filtering (probably for transients extraction), then you perform RDFT (previously I thought it was plain FFT but after long investigation it turned out to be RDFT after all) on quarters, merge those quarters using filters that look like convolution filters for four sub-bands, perform RDFT again on the whole block and add some transients. And after that you still may need to reverse the data before using permuted window for overlap-add operation. In other words it’s not fun and I lack education for recognizing all those algorithms used, why they’re used and where it goes wrong.

So hopefully I’ll return to it some day to fix it for good but now VP7 awaits (so I can at least formally declare Duck codecs family done and move to implementing missing bits in the framework itself).

NihAV: still ducking

Saturday, July 27th, 2019

While it’s summer and I’d rather travel around (or suffer from heat when I can’t), there has been some progress on NihAV. Now I can decode VP5 and VP6 files. Reconstruction still sucks because it takes a lot of effort to make perfect reconstruction and I’m too lazy to do that when simple demonstration that the decoder works would suffice.

Anyway, now I can decode both VP5 and VP6 files including interlaced ones. Interlacing in VP5/6 is done in very simple way like many other codecs: there’s a bit for each macroblock telling whether macroblock should be output in interlaced form or not.

Of course this being VPx family, they had to do it with some creativity. First you decode base interlaced bit probability, which is stored as 8-bit value while all other bit probabilities are stored in 7 bits. Then you derive actual probability for interlaced bit and decode it before any other macroblock information (including macroblock type—it’s that important). Probability is derived by companding base probability depending on whether last macroblock was interlaced (then probability is halved) or not (then it’s remapped to fit 128-255 range)—except for the first macroblock in a row which would use the base probability without modifications. And for VP6 you also have to use different starting scan order (band assignment for each coefficient, now it’s shuffled). This is so trivial that one would wonder why this has not been done in libavcodec decoder yet.

There are three possible things to do next: polish current implementation, move to AVC (On2 AVC that is) or move to AVC (Duck VP7 which is AVC ripoff). But probably I’ll simply keep doing nothing instead.

NihAV: rust-clippy experience

Saturday, May 18th, 2019

As I’ve mentioned in the previous post, I’ve finally tried rust-clippy to see what issues and suggestions it will have on my code. The results are not disappointing if you take the tool name seriously.
(more…)

NihAV: after clean-up

Friday, May 17th, 2019

Since the clean-up work on NihAV is done and I progress with Truemotion VP3 decoder, it’s a good time to talk about what I’ve actually done—there’s even more material to write waiting in the queue.

The intent was to make all frame-related stuff thread-safe and improve efficiency a bit. In order to do the former I had to replace most of the references from Rc<RefCell<T>> to Arc<T> and while doing it I introduced aliases like type NAFrameRef = Arc<NAFrame> and .into_ref() methods to convert object into ref-counted version. This helped when I tried switching from one implementation of reference counter to another and will make it easy to switch again if I ever need that (hopefully not). Now about improved efficiency and how it’s related to the ref-counting.

There’s a straightforward way of dealing with frames: you allocate the picture, fill it, dispose, allocate a new one, etc etc. And there’s a more effective way: you allocate several pictures at once, select an unused one, fill, return to the pool when it’s not needed any more. That is where reference counting comes into play and where Rust default structures don’t help. Frame pool owns the reference and decoder gets a second copy. And Rust Arc is intended for single ownership: when you try to access the shared object it will simply clone it so you end up working with a copy (which defies the purpose). So I had to NIH my own NABufferRef<T> which keeps reference counts and still allows shared access even for writing (currently it does that in all cases but if I need to add some guards the API won’t have to be changed for that). The implementation is very simple: the structure contains a raw pointer to a structure that contains actual object and AtomicUsize counter. The whole implementation is ~2.2kB relying just on std crate.

And finally I’ve made a picture pool. The difference between picture and frame is all additional metadata picture should not care about (like timestamps, stream information and such). Because of the design decisions I have three different picture formats (implemented for 8-, 16- and 32-bit element sizes, Rust does not like aliasing after all), which means I need to provide decoder with all three picture pools because we can’t say in advance which one codec will use (if at all—the option to allocate new non-pooled pictures is still there). Also I want to keep those pools external in case the code around it wants to do keep more pictures in it (e.g. 2-3 pictures required by decoder and 25 pictures pre-buffered for the display). This resulted in a structure called NADecoderSupport that contains picture pools and may have something else added late. Of course people might argue that it’s much better to have AVCodecContext with a myriad of fields you can set directly or via utility functions but I’d rather not have one single structure. Though it might be a good place to put various decoder options there (so that decoder can ignore them at its leisure).

Since I said I did it to increase efficiency I should probably give some numbers too: RealVideo 3/4/6 decoders now use buffer pool (for three frames obviously) and reallocate it on format change. Decoding time got reduced by 4-5% from using the pool. Currently I don’t care about speed much but I may convert more decoders to it if the need arises.

In conclusion I want to say that even I did not enjoy doing that work much, it was needed and gave me some experience plus some improvements in code and design. So it was not a wasted effort.

P.S. I also installed rust-clippy since it’s in stable now and tried to fix errors and warnings it reported. But that is a story for another post.

NihAV: now with TM2X support!

Thursday, April 11th, 2019

I’m proud to say that NihAV got TrueMotion 2X support. For now only intra frames are supported but 75% of the samples I have (i.e. three samples) have just intra frames. At least I could check that it works as supposed.

First, here’s codec description after I managed to write a working decoder for it. TrueMotion 2X is another of those codecs that’s closer to TrueMotion 1 in design. It still uses the same variable-length codebook instead of Huffman coding (actually only version 5 of this codec uses bit reading for anything). It also uses “apply variable amount of deltas per block” approach but instead of old fixed scheme it now defines twenty-something coding approaches and tells decoder which ones to use in current frame. That is done because block size now can be variable too (but it’s always 8 in all files I’ve seen). And blocks are grouped in tiles (usually equivalent to one row of blocks but again, it may vary). The frame data obfuscation that XORs chunks inside the frame with a 32-bit key derived in a special way is not worth mentioning.

Second, the reference is quite peculiar too. It decodes frame data by filling an array of pointers to the functions that decode each line segment with proper mode, move to the next line and repeat. And those functions are in handwritten assembly—they use stack pointer register for decoder context pointer (that has original ESP saved somewhere inside), which also means they do not use stack space for anything and instead of returning they simply jump to the next routine until the final one restores the stack and returns properly. Thankfully Ghidra allows to assign context argument to ESP and while decompile still looks useless, assembly has proper references in the form mov EDX, dword ptr [ctx->luma_pred + ESP].

And finally, I could not check what binary specification really does because MPlayer could not run it. At first I tried running working combination of WMP+Win98 under OllyDbg in QEMU but it was painfully slow and even more painful to look at the memory state. In result I’ve managed to run TM2X decoder in MPlayer which then served as a good reference. The trick is that you should not try to run tm2X.dll (it’s really hopeless) but rather to take tm2Xdec.ax (or deceptively named tm20dec.ax from the same distribution that can handle TM2X unlike its earlier versions), patch one byte for check in DLL init and it works surprisingly well after that.

So what’s next? Probably I’ll just add missing features for the second TM2X sample (the other two samples are TM2A), maybe add Bink2 deblocking feature—since I’d rather have that decoder complete—and move to improving overall NihAV design. Frame management needs proper rework before I add more codecs—I want to change into a thread-safe version before I add more decoders. Plus I’ll need to add some missing bits for a player. There’s a lot of work still to do but I’m pleased that I still managed to do something.

NihAV: even more Bink2 support!

Wednesday, March 13th, 2019

After managing to decode the first frame of KB2g variant I had three options: try to decode the other frames, try to decode other variants or do nothing. While the third option is the most appealing and the first option is the most logical, I stuck with the second one. So now I can decode the first frame of KB2f variant of Bink2 as well. Unfortunately the only (partial) KB2a sample I know is not supported, probably it’s a beta version that was tried on one game like Bink version b. Beside a small surprise in one place bitstream decoding was rather simple. Inter-frame support should not be that hard but it might get messy because of the DC and MV prediction.

And while talking about REing Bink I should mention that I’ve tried Ghidra while doing KB2f work. It is a nice tool that sucks in some places (not having a good highlight for variables, decompiling SIMD code results in very questionable output, the system being Java-based and requiring recent JDK—that’s the worst issue really) it works and produces decent results (including the decompiler). Also since it has 16-bit decompiler support maybe I’ll manage to figure out how those clips in Monty Python & the Quest for the Holy Grail are stored.

I should start documenting it too.

Insignificant update: okay, now it decodes inter-frame data correctly too and the only thing left is to make it reconstruct them correctly. Also I’ve updated codec information on Multimedia Wiki. Actually now it works quite okay so I’m not going to pursue it further. I have no real interest in Bink2 decoding after all.

NihAV: some Bink2 support

Sunday, March 10th, 2019

It took a long time but finally I can decode the first frame of Bink2 video (just KB2g flavour though but it’s a start).

At least the initial observations were correct: Bink2 codes data in 32×32 macroblocks, two codebooks for AC zero runs, one codebook for motion vector components, simple codes with unary prefix for the rest.

If you wonder why it took so long—that’s because I’m lazy and spend an hour or less a day on it. Also while the codec is simple in design it’s a bit complicated in implementation. While previous version related on format sub-version to decide which feature to use, Bink2 uses frame flags to decide which feature to use. For instance, flag 0x1000 signals that there are two bit arrays coded that tell when to read an additional flag during CBP decoding that tells which one of two codebooks should be used during AC decoding later. And flag 0x2000 essentially tells to use different bitstream decoding (like motion vectors decoding or block type decoding). Or the fact that it employs DC and MV prediction that usually has four cases (top-left macroblock, top block, left block, some block inside) plus WMV1-like handling of DC prediction in inter-frames (i.e. it calculated DC for inter blocks and uses them for prediction). And of course DC prediction for inter blocks works a bit different. Plus it tries to track internal state by packing all flags into 32-bit word and updating it for each block (two bits are for signalling top row, one—for macroblock being the leftmost one, some bits are copied from frame flags etc etc). So there’s a lot of nuances to take care of.

And that’s not counting the fact that current Bink2 player can’t decode versions prior to KB2g at all. Since I have some KB2f samples along with an old Bink player that can handle them, I guess I’ll support them eventually.

NihAV: Overall Design—Current and Intended

Saturday, February 16th, 2019

Okay, I’ve finally done all the low-hanging fruits and now the progress is blocked by the need of reverse engineering (namely, Bink 2, Discworld Noir BMV and TrueMotion 2X) so I don’t expect anything major happening to the project in near time.

Meanwhile it’s a good opportunity to talk about how NihAV is (mis)designed and how it should work in principle.

(more…)

NihAV: first quacks

Sunday, February 10th, 2019

As you can guess from the title NihAV got some support for Duck formats, namely TrueMotion 1 and TrueMotion RT. The implementation was rather straightforward except that it took some additional work to support 16-bit video buffers.

Of course I made sure my new TM1 decoder supports decoding sprites. Here’s an example of such sprite picture:

The hardest part was finding a sample.

I can’t sanely support transparency though since it uses 6-bit alpha with RGB555 image and while I can support such format quite easily I’d rather not.

If you wonder about the details of sprite support, it’s almost the same as ordinary inter-coded 16-bit TM1 with some nuances:

  1. frame header has additional 16-bit fields for sprite position and size (and actual sprite size is used in the decoding—the result is supposed to be put over the destination picture);
  2. sprite has twice as much mask bits as inter frame—two per 4×4 block (LSB first as usual). Bits 00 mean the next four pixels should be skipped (and predictor reset to zero), bits 01 mean it’s opaque sprite data and bits 10 mean it’s sprite data with transparency info present;
  3. sprite data is decoded as standard 4×4 TM1 block data (i.e. on C delta per 4×4 block) except that in transparency mode it also reads transparency data after each pixel pair.

That information comes from our old trusty source of information called VPVision source code dump (which was used to understand TrueMotion 1, 2 and probably DK3/DK4 ADPCM (and maybe VP3 but I’m not sure at all). Also it turns out to contain TrueMotion RT encoder source code as well (which could be used to reconstruct the decoder but I forgot about it at the time and used the binary specification instead).

And now I’d like to talk about Duck codecs in general.

The codecs from this family can be divided into three groups:

  1. The Age of Darkness: the original TrueMotion codec and its evolution plus related ADPCM codecs;
  2. The Age of Enlightenment: game codecs evolving into more generic video codecs and using more mainstream codec design (DCT-based, many ideas borrowed from H.263 and H.264) plus AVC (that’s audio codec if you don’t remember);
  3. The Age of EA Guardian: the codecs produced after Duck was bought by certain company.

The Age of Darkness codecs

Those codecs were used mostly in video games but TM1 was also licensed to Horizons Technology.

The idea behind TM1 is very simple: you split video into 4×4 blocks, predict each pixel from top and pack using quantised deltas and fixed codebook looking more like Tunstall codes (i.e. output code is always a fixed length of one byte but it may correspond to a variable length sequence of input codes). Also depending on quality frame blocks have different number of colour difference deltas per block (1, 2 or 4).

TrueMotion RT is an adaptation of TM1 for real-time video capturing (hence the name). In this case video is coded as planar YUV410 using fixed set of deltas with index taking 2, 3 or 4 bits. But the general coding idea (top and left prediction, delta quantisation and coding its index) remains the same. It uses the same frame header obfuscation so it’s probably an elder sibling of TrueMotion 2 (and its name is more like TrueMotion RT version 2.0 and not TrueMotion 2 RT but the details are unclear). There are different versions of the codec, for example Star Control II: The Ur-Quan Masters on 3DO used a special TM1 format split into several files: .hdr for global information (including quantised delta sets), .tbl with codebook definition, .duk with actual frame data and .frm with the frame offsets for .duk file. It’s a pity I can’t support it without very special handling.

TrueMotion 2 gets rid of single static codebook and packs appropriate data (deltas for different-resolution blocks, motion vector data, actual block types etc etc) in separate segments with their own Huffman codes. There are many improvements but the codec still operates on 4×4 blocks with horizontal and vertical prediction of each symbol.

There is not much known about TrueMotion 2X but so far it looks like maybe slightly improved TM2. Hopefully it will be clearer if I manage to implement a decoder for it.

And finally there were two simple ADPCM codecs accompanying video (usually TM2), there’s nothing much to say about those.

The Age of Enlightenment codecs

This was the age when Duck codecs became widely known and accepted, when various companies licensed them for their own needs and when it was really the golden age for them.

It all starts with TrueMotion VP3 that set the standard for the following codecs. It employed the a bit non-standard 8×8 DCT, referencing last intra frame as an alternative to referencing just the previous frame (later knows as golden frame), with various types of information grouped together instead of interleaving it all, and with coefficients coded as tokens (EOB, zero run, plus-minus one, plus-minus two, large coefficient token and such). The same approach would be used for subsequent codecs as well. Of course it briefly enjoyed the renaissance when Duck decided to put it into open-source and Xiph Theora was created on its base (and since there were no other free and open-source video codec alternatives it was destined to have popularity and success before something better comes).

TrueMotion VP4 was mostly the same but with different coding method for some data types. Maybe it was the first codec to move edge loop filtering from being performed on the frame to being performed on temporary block used in motion compensation but I’m not entirely sure.

TrueCast VP5 was the first in the series to employ their own version of static binary arithmetic coder mostly known as bool coder. That means that instead of updating bit probability after each decoding using that context as CABAC does, frame header encodes fixed probabilities (or just updates from the probabilities in the previous frame) and uses them for decoding.

VP6. Probably the most famous of them all since it was used in Flash videos. From technical point of view it’s just small improvement in some details over VP5. I suspect this was the first codec in the series that introduced selecting random frame as the next golden frame (previously it was just last intra frame, now any inter frame can signal that it should become golden).

VP7. This is the first installation in the series that was based on H.264 ideas like 4×4 transform and spatial prediction.

And of course there’s AVS, an audio codec inspired by AAC LC that accompanied some VP5-VP7 videos.

The Age of Guardian codecs

While the design direction has not changed much, the codecs themselves mostly belong to the niche provided by their current owner and hardly used anywhere else. For now we have VP8, VP9 and VP10 (aka AV1).


I hope there will be more to write about those after I write decoders for the rest of them and learn the shameful details of their design in the process.

NihAV: split and games

Wednesday, February 6th, 2019

I have not written anything about NihAV for the last two months. If you thought it’s because nothing was done you’d be mostly correct (but not in thinking about the project at all—this is pointless). And yet there has been some progress recently (but before that I spent a whole month not doing anything NihAV-related).

First thing, I’ve finally split NihAV into smaller crates. This was done in order to reduce compilation and testing time for separate components. It started to feel a lot like FFmpeg and Libav times except that I don’t have to rerun configure and recompile everything in case I want to add a new decoder. Now codecs reside in standalone crates that contain just relevant decoders and demuxers so “write code—fail to compile—fix—run tests” cycle goes faster now.

And here’s the new structure:

  • nihav-core — the core. It contains the definitions for basic stuff like packets and video/audio buffers, I/O routines (byte- and bitreader), DSP and common video stuff for H.263-based decoders;
  • nihav-commonfmt — for various common formats. Currently it has all leftover codecs I could not move elsewhere (and AVI demuxer). Maybe in the future it will grow too large so I’ll split out some groups like speech codecs into new crates;
  • nihav-duck — for Duck game codecs up to VP7;
  • nihav-game — for various game codecs and containers;
  • nihav-indeo — for Indeo video and audio codecs;
  • nihav-rad — for totally RAD formats like Smacker and Bink;
  • nihav-realmedia — full RealMedia support (except for common decoders in nihav-commonfmt);
  • and finally nihav-allstuff — the crate that binds them all so you don’t have to search for decoders in separate crates.

Of course there’s nihav-tool to perform the actual decoding and test whether the code works but it’s always been a standalone crate.

This split required some changes in the decoder and demuxer handling. Previously I could get away with one table where all decoders (or demuxers) were registered and then search for an appropriate decoder there. Now it is not possible since there are many crates with individual codecs enabled or disabled in configuration. In result I had to introduce a structure called RegisteredDecoders and have crate-specific function for registering all decoders featured in crate in it. And nihav_allstuff::nihav_register_all_codecs() simply invokes them all for convenience. It’s exactly the same with demuxers too.

N.B.: I should probably write a separate post on overall NihAV design, missing parts and excuses for certain decisions.

Second, as you could spot from the crate names above, instead of trying to make NihAV do something useful (you have rust-av for that though) I decided to focus more on supporting various game formats.

So far I want to support the following formats:

  1. Sierra VMD (developed by Coktel Vision but I care more for Sierra games). Should be an easy one-shot;
  2. Discworld II and Noir BMV. I have added support for DW2 BMV format already (and it took a day to reimplement). DW3 BMV format is almost the same, DW3 audio is known but video codec still requires some reverse engineering. Also fun fact—DW2 BMV video decoder is based on reverse engineered decoder from ScummVM where they did not understand how decoding lengths works and the code still looks like a beautified assembly. For libavcodec decoder I tried my best to simplify the code and give better names to the variables. And when implementing the decoder for nihav-game I found out that “read byte, advance pointer, output nibble and save another one for later” leads to the same results but with a much cleaner code;
  3. RAD codecs. I’ve reimplemented Smacker and Bink formats support (the worst part was NIH-y implementations for DCT-II/DCT-III used in Bink audio decoder but that’s a tale for another time). Maybe now I’ll finally write a decoder for Bink2 video;
  4. Duck codecs. Yes, they are game formats that also were tried as general codecs but with lesser success. There are many games using TrueMotion 1, there are many games using TrueMotion 2, there are some games using TrueMotion VP3 and VP6 in their cutscenes. And the only their codec that got widespread is VP6. I want to implement all the family of the codecs that Duck produced before dying ascending to the higher plane of existence. While most of them are supported there are still missing features like sprites in TrueMotion 1, interlaced mode in VP6 or TM2X entirely. It requires some REing of course but that’s the appeal.

After that it would be nice to work on actual player for the supported formats, which opens a new can of worms like proper frame reordering and format conversion, but I guess I’ll have to deal with that one day.

Anyway, back to doing nothing.