Archive for the ‘NihAV’ Category

Raw streams support in NihAV

Thursday, November 18th, 2021

Sadly there’s enough MP3s in my music collection to ignore the format and I’ve finally implemented MP3 decoding support for NihAV. That involved introducing several new concepts which I’d like to review in this post.

Previously NihAV operated on a simple approach: there’s a demuxer that produces full packets, those packets are fed to the corresponding decoder and the decoded audio/video data is used somehow. With MP3 you have a raw stream of audio packets (sometimes with an additional metadata). While I could pretend to have a demuxer (that will simply read data and form packets) I decided to do it differently.
(more…)

NihAV: now with Flash support

Tuesday, November 2nd, 2021

During my work on VP6 encoder I was contacted by Ruffle developer who was interested in it, one thing led to another and I licensed my decoder for the use there (the main issues were cutting off all the interfaces from NihAV that are not needed for it and selecting the license). But it’s over and they say it’s working fine. Meanwhile I got curious and decided to finally do what no other bit of open-source code could do: encode VP6 to FLV without relying on any external software.

In addition to the FLV muxer I also implemented all known decoders as well and that was uneven load. One evening was enough to implement two and half codecs: FLV1 (it’s just H.263 with slightly different header and block format), Flash ADPCM (a slight variation of IMA ADPCM) and a bit of ASAO. Another day was spent on trying to make ASAO work properly (I dislike codecs with parametric bit allocation like this one, at least it’s not a typical speech codec). VP6 modifications took minutes, Flash Screen Video was done in less than an hour, Flash Screen Video 2 took the rest of a day (because I completely forgot how priming works there). I wasted another day on hacking barely enough support for onMetaData packet parsing and the other codec-specific bits in FLV demuxer.

And now it’s ready and more or less working. It can even play H.264+AAC combination (remember when it was popular), the only codecs it does not support are Speex (I’m not sure if I ever want to touch it) and MP3 (this one I’ll deal with eventually and FLV will provide me with nicely split MP3 packets for testing before the infrastructure for handling raw streams is ready).

Now what to do next? It would be nice to have SANM/SMUSH support, maybe get to MP3 already (so nihav-sndplay is even more usable for me) or RE all those VoxWare codecs (I hope I can find the samples). There’s some interest for bearly functioning VP7 encoder too.

But who cares about that? I can encode VP6 into FLV now (even if I have no reasons to do so).

NihAV: now with lossless audio encoder

Tuesday, October 26th, 2021

Since I wanted to try my hoof at various encoding concepts it’s no wonder that after lossy audio encoder (IMA ADPCM with trellis encoding), lossless video encoder (ZMBV, using my own deflate implementation for compressing), lossy video encoder (Cinepak, for playing with vector quantisation, and VP6, for playing with many other concepts) it was time for a lossless audio encoder.

To remind you, there are essentially two types of lossless audio compressors—fast and asymmetric (based on LPC filters) and slow and symmetric (based on adaptive filters, usually long LMS ones). The theory behind them is rather simple and described below.
(more…)

VP8: dubious decisions, deficiencies and outright idiocy

Friday, October 15th, 2021

I’ve finally finished VP8 decoder for NihAV (which was done mainly by hacking already existing VP7 decoder) and I have some unpleasant words to say about VP8. If you want to read praises to the first modern open-source patent-free video codec (and essentially the second one since VP3/Theora) then go and read any piece of news from 2011. Here I present my experience implementing the format and what I found not so good or outright bad about the “specification” and the format itself.

(more…)

VP6 encoding guide

Wednesday, October 6th, 2021

As I wanted to do before, I’ve written a short guide on how to encode VP6 to FLV. You can find it here, at NihAV site.

You should be able to encode raw video into VP6 in AVI or (with a slightly custom build) to VP6 in EA format (if you want to test if the encoder is good enough for modding purposes; but I guess even Peter Ross won’t care about that). As usual, it’s not guaranteed to work but it seems to work for me.

And that should be it. I might do VP7 encoder later (much later!) just for lulz but so far I can see way more interesting things to do (more formats to decode, lossless audio encoder and such).

VP6 encoder design

Saturday, October 2nd, 2021

This is the penultimate post in the series (there shall be another post, on how to use the encoder—but if there’s no interest I can simply skip it making this the last post in the series). As promised before, here I’ll present the layout and the details of my encoder.
(more…)

Is VP8 a Duck codec?

Friday, October 1st, 2021

There’s a blog out there with posts dedicated to the history of On2 (née Duck). And one particular post (archived version) brought an unsettling thought that refuses to leave me. Does VP8 belong to Duck or Baidu (yes, I’ll keep calling this company by value) codecs?

Arguments for Duck theory:

  1. it was released in 2008, before acquisition (which happened in 2010);
  2. it can be seen as an improvement of VP7, which is definitely a Duck codec;
  3. its documentation is as lacking as for the previous codecs.

Arguments for Baidu theory:

  1. it became famous after the company was bought and the codec was open-sourced;
  2. as a follow-up from the previous item, there is an open-source library for decoding and encoding it (I think the previous source dump had an encoder just for TMRT and maybe it was an oversight);
  3. it has its own ecosystem (all previous codecs were stored in AVI, this one uses WebMKV);
  4. I don’t have to implement it in NihAV (because I wanted nihav_duck crate to contain decoders for all Duck formats and if VP8 is not really a Duck codec I don’t have to do anything).

So, what do you think?

VP6 — rate control and rate-distortion optimisation

Thursday, September 30th, 2021

First of all, I want to warn you that “optimisation” part of RDO comes from mathematics with its meaning being selecting an element which satisfies certain criteria the best. Normally we talk about optimisation as a way for code to run faster but the term has more general meaning and here’s one of such cases.

Anyway, while there is a lot of theory behind it, the concepts are quite simple (see this description from a RAD guy for a short concise explanation). To put it oversimplified, rate control is the part of an encoder that makes it output stream with the certain parameters (i.e. certain average bitrate, limited maximum frame size and such) and RDO is a way to adjust encoded stream by deciding how much you want to trade bits for quality in this particular case.

For example, if you want to decide which kind of macroblock you want to encode (intra or several kinds of inter) you calculate how much the coded blocks differ from the original one (that’s distortion) and add the cost of coding those blocks (aka rate) multiplied by lambda (which is our weight parameter that tells how much to prefer rate over distortion or vice versa). So you want to increase bitrate? Decrease lambda so fidelity matters more. You want to decrease frame size? Increase lambda so bits are more important. From mathematical point of view the problem is solved, from implementation point of view that’s where the actual problems start.
(more…)

VP6 encoder: done!

Wednesday, September 29th, 2021

Today I’ve finished work on my VP6 encoder for NihAV and it seems to work as expected (which means poorly but what else to expect from a failure). Unfortunately even if the encoder is complete from my point of view, there are still some things to do: write a couple of posts on rate control/RDO and the overall design of my encoder and make it more useful for the people brave enough to use it in e.g. Red Alert game series modding. That means adding some input format support useful for the encoder (I’ve hacked Y4M input support but if there’s a request for a lossless codec in AVI, I can implement that too) and write a page describing how to use nihav-encoder to encode content in VP6 format (AVI only, maybe I’ll add FLV later as a joke but FLV decoding support should come first).

And now I’d like to talk about what features my encoder has and why it lacks in some areas.

First, what it has:

  • all macroblock types are supported (including 4MV and those referencing golden frame);
  • custom models updated per frame;
  • Huffman encoding mode;
  • proper quarterpel motion estimation;
  • extremely simple golden frame selection;
  • sophisticated macroblock type selection process;
  • rudimentary rate control.

In other words, it can encode a stream having all but just a couple of features and with the varying quality as well.

And what it doesn’t have:

  • interlacing! It should not be that hard to add but my principles say no to supporting it at all (except in some decoders where it can’t be avoided);
  • alpha support—it’s rather easy to add but there’s little use for it;
  • custom scan order—it’s not likely to give a significant gain while it’s quite hairy to implement properly (it’s not that complex per se but it’ll need a lot of debugging to get it right because of its internal representation);
  • advanced profile features like bicubic interpolation filters and selecting parameters for it (again, too much work too little fun);
  • context-dependent macroblock size approximations (i.e. calculate expected size using the information about already selected preceding macroblocks instead of fixed guesstimates);
  • better macroblock and frame size approximations in general (more about it in the upcoming post);
  • better golden frame selection (I don’t even know what would be a good condition for that);
  • dynamic intra frame selection (i.e. code a frame as I-frame where it’s appropriate instead of each N-th frame);
  • proper rate control (this should be discussed in the upcoming post).

This is an example of progressive approach to the development (in the same sense as progressive JPEG coding): first you implement rough approximation of what you want to have and keep on expanding and improving various features until some arbitrary limit is reached. A lot of the features that I’ve not implemented properly need a lot of time (and sometimes significant domain-specific knowledge) for a proper implementation so I simply stopped where it was either working good enough or it would be not fun to continue.


So, with the next couple of posts on still not covered details (RDO+rate control and overall design) the journey should be complete. Remember, it’s the best opensource VP6 encoder (for the lack of competition) and since I’ve managed to make something resembling an encoder, maybe you can write something even better?

How to perform fast motion search

Saturday, September 25th, 2021

To answer the obvious question with the obvious answer, brute force searching for a decent motion vector takes insanely large time. For example, VP6 motion search area be up to 63×63 pixels and checking all possible positions there requires a lot of tries. And if you remember that VP6 has quarterpel motion compensation precision, you should multiply that number by 16 possible sub-pixel positions. Obviously in order to reduce the number of tries various tricks are employed.

While by itself fast motion search methods I describe here are not that complex, it was rather hard to locate books where such details of developing video encoders are presented. At last I’ve found two or three books with the chapters dedicated to motion compensation plus the papers referenced there. The results of this mini-research are given below.
(more…)