Archive for the ‘NihAV’ Category

Spectral Band Replication in various audio codecs

Thursday, July 7th, 2022

While the war continues I try to distract myself with something less unpleasant, recently it was implementing SBR support for AAC decoder in NihAV. And since I’ve done that why not talk about this technology and its varieties?

The idea of saving bits in lossy codec by coding less important high frequencies in an approximate way is not very new, the first case that comes to mind was aacPlus codec from Coding Technologies that served as a base for AAC SBR (and MP3Pro if you remember such thing). It allows to code the upper part of the spectrum very efficiently compared to the normal AAC codec (I’d say it takes 4-12kbps for that compared to 50-60kbps used for the lower frequencies) which allowed to cut audio bitrate significantly comparing to AAC-LC of similar quality. And since it was a good idea, other codecs used it as well but in different format (because patent infringements are fun when both parties employ hordes of lawyers).

So, let’s look at how it’s done in the codecs I can remember (if you know more feel free to comment):

  • AAC SBR (also mp3PRO but who cares)—the original that inspired other implementations. It works by splitting frame into series of 64-band slots (using complex numbers unless it’s a coarse low-power SBR that uses only real numbers), copying lower frequencies into high ones using a certain shape and adding scaled noise or tones (those two are mutually exclusive). For transmission efficiency lots of those parameters are derived from the configuration (that is transmitted once for couple frames) and essentially only envelopes used to shape coefficients and noise plus some flags are coded. You have to generate a lot of tables (like how QMF bands are grouped for four modes of operation, what gains to use on coefficients/noise/tones for each QMF band in each slot and so on). Eventually there were other variants developed (because there are other AAC codecs that could use it) but the approach remains the same;
  • E-AC-3—this codec has SPectral eXtension which divides frame into fixed sub-bands and copies data from lowed sub-bands, applies a specific scale to that data and blends it with noise scaled with another scale;
  • AC-4—this one has A-SPX that looks a lot like the original SBR (and considering that D*lby got the team behind it it’s not that surprising). I can’t be bothered to look at the finer details but from a quick glance it looks very similar (starting with all those FixVar and VarFix envelopes). If you want to know more about the implementation just ask Paul B. Mahol, it should be more fun than the usual questions about AC-4 he gets;
  • ATRAC9 (but not earlier)—this codec seems to split spectrum into four parts, fills them either with mirrored coefficients from below or with noise and applies coarser or finer scaling to those bands;
  • WMA9 (or was it WMAPro or WMA3?)—as usual it’s “we should overengineer AAC” approach. There’s not much known about how it really functions but it seems to split higher frequencies into variable-length bands and code motion vectors for each band telling from which position to copy (and since audio frames in MDCT-based codec are essentially P-frames, this is too close to being a video codec for my taste). There are three modes of operation for the bands too: copy data, fill with noise, or copy only the large coefficients and fill the rest with noise. I have an impression they tried to make it less computation-heavy than AAC SBR while having the similar or larger amount of features.

I guess you can see how these approaches are different yet alike at the same time and why it was not much fun to implement it. Yet I still don’t consider this time wasted as I gained more understanding on how it works (and why I didn’t want to touch it before). Now maybe it’s time to finally play with inline assembly in Rust.

Raw streams support in NihAV

Thursday, November 18th, 2021

Sadly there’s enough MP3s in my music collection to ignore the format and I’ve finally implemented MP3 decoding support for NihAV. That involved introducing several new concepts which I’d like to review in this post.

Previously NihAV operated on a simple approach: there’s a demuxer that produces full packets, those packets are fed to the corresponding decoder and the decoded audio/video data is used somehow. With MP3 you have a raw stream of audio packets (sometimes with an additional metadata). While I could pretend to have a demuxer (that will simply read data and form packets) I decided to do it differently.
(more…)

NihAV: now with Flash support

Tuesday, November 2nd, 2021

During my work on VP6 encoder I was contacted by Ruffle developer who was interested in it, one thing led to another and I licensed my decoder for the use there (the main issues were cutting off all the interfaces from NihAV that are not needed for it and selecting the license). But it’s over and they say it’s working fine. Meanwhile I got curious and decided to finally do what no other bit of open-source code could do: encode VP6 to FLV without relying on any external software.

In addition to the FLV muxer I also implemented all known decoders as well and that was uneven load. One evening was enough to implement two and half codecs: FLV1 (it’s just H.263 with slightly different header and block format), Flash ADPCM (a slight variation of IMA ADPCM) and a bit of ASAO. Another day was spent on trying to make ASAO work properly (I dislike codecs with parametric bit allocation like this one, at least it’s not a typical speech codec). VP6 modifications took minutes, Flash Screen Video was done in less than an hour, Flash Screen Video 2 took the rest of a day (because I completely forgot how priming works there). I wasted another day on hacking barely enough support for onMetaData packet parsing and the other codec-specific bits in FLV demuxer.

And now it’s ready and more or less working. It can even play H.264+AAC combination (remember when it was popular), the only codecs it does not support are Speex (I’m not sure if I ever want to touch it) and MP3 (this one I’ll deal with eventually and FLV will provide me with nicely split MP3 packets for testing before the infrastructure for handling raw streams is ready).

Now what to do next? It would be nice to have SANM/SMUSH support, maybe get to MP3 already (so nihav-sndplay is even more usable for me) or RE all those VoxWare codecs (I hope I can find the samples). There’s some interest for bearly functioning VP7 encoder too.

But who cares about that? I can encode VP6 into FLV now (even if I have no reasons to do so).

NihAV: now with lossless audio encoder

Tuesday, October 26th, 2021

Since I wanted to try my hoof at various encoding concepts it’s no wonder that after lossy audio encoder (IMA ADPCM with trellis encoding), lossless video encoder (ZMBV, using my own deflate implementation for compressing), lossy video encoder (Cinepak, for playing with vector quantisation, and VP6, for playing with many other concepts) it was time for a lossless audio encoder.

To remind you, there are essentially two types of lossless audio compressors—fast and asymmetric (based on LPC filters) and slow and symmetric (based on adaptive filters, usually long LMS ones). The theory behind them is rather simple and described below.
(more…)

VP8: dubious decisions, deficiencies and outright idiocy

Friday, October 15th, 2021

I’ve finally finished VP8 decoder for NihAV (which was done mainly by hacking already existing VP7 decoder) and I have some unpleasant words to say about VP8. If you want to read praises to the first modern open-source patent-free video codec (and essentially the second one since VP3/Theora) then go and read any piece of news from 2011. Here I present my experience implementing the format and what I found not so good or outright bad about the “specification” and the format itself.

(more…)

VP6 encoding guide

Wednesday, October 6th, 2021

As I wanted to do before, I’ve written a short guide on how to encode VP6 to FLV. You can find it here, at NihAV site.

You should be able to encode raw video into VP6 in AVI or (with a slightly custom build) to VP6 in EA format (if you want to test if the encoder is good enough for modding purposes; but I guess even Peter Ross won’t care about that). As usual, it’s not guaranteed to work but it seems to work for me.

And that should be it. I might do VP7 encoder later (much later!) just for lulz but so far I can see way more interesting things to do (more formats to decode, lossless audio encoder and such).

VP6 encoder design

Saturday, October 2nd, 2021

This is the penultimate post in the series (there shall be another post, on how to use the encoder—but if there’s no interest I can simply skip it making this the last post in the series). As promised before, here I’ll present the layout and the details of my encoder.
(more…)

Is VP8 a Duck codec?

Friday, October 1st, 2021

There’s a blog out there with posts dedicated to the history of On2 (née Duck). And one particular post (archived version) brought an unsettling thought that refuses to leave me. Does VP8 belong to Duck or Baidu (yes, I’ll keep calling this company by value) codecs?

Arguments for Duck theory:

  1. it was released in 2008, before acquisition (which happened in 2010);
  2. it can be seen as an improvement of VP7, which is definitely a Duck codec;
  3. its documentation is as lacking as for the previous codecs.

Arguments for Baidu theory:

  1. it became famous after the company was bought and the codec was open-sourced;
  2. as a follow-up from the previous item, there is an open-source library for decoding and encoding it (I think the previous source dump had an encoder just for TMRT and maybe it was an oversight);
  3. it has its own ecosystem (all previous codecs were stored in AVI, this one uses WebMKV);
  4. I don’t have to implement it in NihAV (because I wanted nihav_duck crate to contain decoders for all Duck formats and if VP8 is not really a Duck codec I don’t have to do anything).

So, what do you think?

VP6 — rate control and rate-distortion optimisation

Thursday, September 30th, 2021

First of all, I want to warn you that “optimisation” part of RDO comes from mathematics with its meaning being selecting an element which satisfies certain criteria the best. Normally we talk about optimisation as a way for code to run faster but the term has more general meaning and here’s one of such cases.

Anyway, while there is a lot of theory behind it, the concepts are quite simple (see this description from a RAD guy for a short concise explanation). To put it oversimplified, rate control is the part of an encoder that makes it output stream with the certain parameters (i.e. certain average bitrate, limited maximum frame size and such) and RDO is a way to adjust encoded stream by deciding how much you want to trade bits for quality in this particular case.

For example, if you want to decide which kind of macroblock you want to encode (intra or several kinds of inter) you calculate how much the coded blocks differ from the original one (that’s distortion) and add the cost of coding those blocks (aka rate) multiplied by lambda (which is our weight parameter that tells how much to prefer rate over distortion or vice versa). So you want to increase bitrate? Decrease lambda so fidelity matters more. You want to decrease frame size? Increase lambda so bits are more important. From mathematical point of view the problem is solved, from implementation point of view that’s where the actual problems start.
(more…)

VP6 encoder: done!

Wednesday, September 29th, 2021

Today I’ve finished work on my VP6 encoder for NihAV and it seems to work as expected (which means poorly but what else to expect from a failure). Unfortunately even if the encoder is complete from my point of view, there are still some things to do: write a couple of posts on rate control/RDO and the overall design of my encoder and make it more useful for the people brave enough to use it in e.g. Red Alert game series modding. That means adding some input format support useful for the encoder (I’ve hacked Y4M input support but if there’s a request for a lossless codec in AVI, I can implement that too) and write a page describing how to use nihav-encoder to encode content in VP6 format (AVI only, maybe I’ll add FLV later as a joke but FLV decoding support should come first).

And now I’d like to talk about what features my encoder has and why it lacks in some areas.

First, what it has:

  • all macroblock types are supported (including 4MV and those referencing golden frame);
  • custom models updated per frame;
  • Huffman encoding mode;
  • proper quarterpel motion estimation;
  • extremely simple golden frame selection;
  • sophisticated macroblock type selection process;
  • rudimentary rate control.

In other words, it can encode a stream having all but just a couple of features and with the varying quality as well.

And what it doesn’t have:

  • interlacing! It should not be that hard to add but my principles say no to supporting it at all (except in some decoders where it can’t be avoided);
  • alpha support—it’s rather easy to add but there’s little use for it;
  • custom scan order—it’s not likely to give a significant gain while it’s quite hairy to implement properly (it’s not that complex per se but it’ll need a lot of debugging to get it right because of its internal representation);
  • advanced profile features like bicubic interpolation filters and selecting parameters for it (again, too much work too little fun);
  • context-dependent macroblock size approximations (i.e. calculate expected size using the information about already selected preceding macroblocks instead of fixed guesstimates);
  • better macroblock and frame size approximations in general (more about it in the upcoming post);
  • better golden frame selection (I don’t even know what would be a good condition for that);
  • dynamic intra frame selection (i.e. code a frame as I-frame where it’s appropriate instead of each N-th frame);
  • proper rate control (this should be discussed in the upcoming post).

This is an example of progressive approach to the development (in the same sense as progressive JPEG coding): first you implement rough approximation of what you want to have and keep on expanding and improving various features until some arbitrary limit is reached. A lot of the features that I’ve not implemented properly need a lot of time (and sometimes significant domain-specific knowledge) for a proper implementation so I simply stopped where it was either working good enough or it would be not fun to continue.


So, with the next couple of posts on still not covered details (RDO+rate control and overall design) the journey should be complete. Remember, it’s the best opensource VP6 encoder (for the lack of competition) and since I’ve managed to make something resembling an encoder, maybe you can write something even better?