NihAV: even more Bink2 support!

March 13th, 2019

After managing to decode the first frame of KB2g variant I had three options: try to decode the other frames, try to decode other variants or do nothing. While the third option is the most appealing and the first option is the most logical, I stuck with the second one. So now I can decode the first frame of KB2f variant of Bink2 as well. Unfortunately the only (partial) KB2a sample I know is not supported, probably it’s a beta version that was tried on one game like Bink version b. Beside a small surprise in one place bitstream decoding was rather simple. Inter-frame support should not be that hard but it might get messy because of the DC and MV prediction.

And while talking about REing Bink I should mention that I’ve tried Ghidra while doing KB2f work. It is a nice tool that sucks in some places (not having a good highlight for variables, decompiling SIMD code results in very questionable output, the system being Java-based and requiring recent JDK—that’s the worst issue really) it works and produces decent results (including the decompiler). Also since it has 16-bit decompiler support maybe I’ll manage to figure out how those clips in Monty Python & the Quest for the Holy Grail are stored.

I should start documenting it too.

Insignificant update: okay, now it decodes inter-frame data correctly too and the only thing left is to make it reconstruct them correctly. Also I’ve updated codec information on Multimedia Wiki. Actually now it works quite okay so I’m not going to pursue it further. I have no real interest in Bink2 decoding after all.

NihAV: some Bink2 support

March 10th, 2019

It took a long time but finally I can decode the first frame of Bink2 video (just KB2g flavour though but it’s a start).

At least the initial observations were correct: Bink2 codes data in 32×32 macroblocks, two codebooks for AC zero runs, one codebook for motion vector components, simple codes with unary prefix for the rest.

If you wonder why it took so long—that’s because I’m lazy and spend an hour or less a day on it. Also while the codec is simple in design it’s a bit complicated in implementation. While previous version related on format sub-version to decide which feature to use, Bink2 uses frame flags to decide which feature to use. For instance, flag 0x1000 signals that there are two bit arrays coded that tell when to read an additional flag during CBP decoding that tells which one of two codebooks should be used during AC decoding later. And flag 0x2000 essentially tells to use different bitstream decoding (like motion vectors decoding or block type decoding). Or the fact that it employs DC and MV prediction that usually has four cases (top-left macroblock, top block, left block, some block inside) plus WMV1-like handling of DC prediction in inter-frames (i.e. it calculated DC for inter blocks and uses them for prediction). And of course DC prediction for inter blocks works a bit different. Plus it tries to track internal state by packing all flags into 32-bit word and updating it for each block (two bits are for signalling top row, one—for macroblock being the leftmost one, some bits are copied from frame flags etc etc). So there’s a lot of nuances to take care of.

And that’s not counting the fact that current Bink2 player can’t decode versions prior to KB2g at all. Since I have some KB2f samples along with an old Bink player that can handle them, I guess I’ll support them eventually.

NihAV: Overall Design—Current and Intended

February 16th, 2019

Okay, I’ve finally done all the low-hanging fruits and now the progress is blocked by the need of reverse engineering (namely, Bink 2, Discworld Noir BMV and TrueMotion 2X) so I don’t expect anything major happening to the project in near time.

Meanwhile it’s a good opportunity to talk about how NihAV is (mis)designed and how it should work in principle.

Read the rest of this entry »

NihAV: first quacks

February 10th, 2019

As you can guess from the title NihAV got some support for Duck formats, namely TrueMotion 1 and TrueMotion RT. The implementation was rather straightforward except that it took some additional work to support 16-bit video buffers.

Of course I made sure my new TM1 decoder supports decoding sprites. Here’s an example of such sprite picture:

The hardest part was finding a sample.

I can’t sanely support transparency though since it uses 6-bit alpha with RGB555 image and while I can support such format quite easily I’d rather not.

If you wonder about the details of sprite support, it’s almost the same as ordinary inter-coded 16-bit TM1 with some nuances:

  1. frame header has additional 16-bit fields for sprite position and size (and actual sprite size is used in the decoding—the result is supposed to be put over the destination picture);
  2. sprite has twice as much mask bits as inter frame—two per 4×4 block (LSB first as usual). Bits 00 mean the next four pixels should be skipped (and predictor reset to zero), bits 01 mean it’s opaque sprite data and bits 10 mean it’s sprite data with transparency info present;
  3. sprite data is decoded as standard 4×4 TM1 block data (i.e. on C delta per 4×4 block) except that in transparency mode it also reads transparency data after each pixel pair.

That information comes from our old trusty source of information called VPVision source code dump (which was used to understand TrueMotion 1, 2 and probably DK3/DK4 ADPCM (and maybe VP3 but I’m not sure at all). Also it turns out to contain TrueMotion RT encoder source code as well (which could be used to reconstruct the decoder but I forgot about it at the time and used the binary specification instead).

And now I’d like to talk about Duck codecs in general.

The codecs from this family can be divided into three groups:

  1. The Age of Darkness: the original TrueMotion codec and its evolution plus related ADPCM codecs;
  2. The Age of Enlightenment: game codecs evolving into more generic video codecs and using more mainstream codec design (DCT-based, many ideas borrowed from H.263 and H.264) plus AVC (that’s audio codec if you don’t remember);
  3. The Age of EA Guardian: the codecs produced after Duck was bought by certain company.

The Age of Darkness codecs

Those codecs were used mostly in video games but TM1 was also licensed to Horizons Technology.

The idea behind TM1 is very simple: you split video into 4×4 blocks, predict each pixel from top and pack using quantised deltas and fixed codebook looking more like Tunstall codes (i.e. output code is always a fixed length of one byte but it may correspond to a variable length sequence of input codes). Also depending on quality frame blocks have different number of colour difference deltas per block (1, 2 or 4).

TrueMotion RT is an adaptation of TM1 for real-time video capturing (hence the name). In this case video is coded as planar YUV410 using fixed set of deltas with index taking 2, 3 or 4 bits. But the general coding idea (top and left prediction, delta quantisation and coding its index) remains the same. It uses the same frame header obfuscation so it’s probably an elder sibling of TrueMotion 2 (and its name is more like TrueMotion RT version 2.0 and not TrueMotion 2 RT but the details are unclear). There are different versions of the codec, for example Star Control II: The Ur-Quan Masters on 3DO used a special TM1 format split into several files: .hdr for global information (including quantised delta sets), .tbl with codebook definition, .duk with actual frame data and .frm with the frame offsets for .duk file. It’s a pity I can’t support it without very special handling.

TrueMotion 2 gets rid of single static codebook and packs appropriate data (deltas for different-resolution blocks, motion vector data, actual block types etc etc) in separate segments with their own Huffman codes. There are many improvements but the codec still operates on 4×4 blocks with horizontal and vertical prediction of each symbol.

There is not much known about TrueMotion 2X but so far it looks like maybe slightly improved TM2. Hopefully it will be clearer if I manage to implement a decoder for it.

And finally there were two simple ADPCM codecs accompanying video (usually TM2), there’s nothing much to say about those.

The Age of Enlightenment codecs

This was the age when Duck codecs became widely known and accepted, when various companies licensed them for their own needs and when it was really the golden age for them.

It all starts with TrueMotion VP3 that set the standard for the following codecs. It employed the a bit non-standard 8×8 DCT, referencing last intra frame as an alternative to referencing just the previous frame (later knows as golden frame), with various types of information grouped together instead of interleaving it all, and with coefficients coded as tokens (EOB, zero run, plus-minus one, plus-minus two, large coefficient token and such). The same approach would be used for subsequent codecs as well. Of course it briefly enjoyed the renaissance when Duck decided to put it into open-source and Xiph Theora was created on its base (and since there were no other free and open-source video codec alternatives it was destined to have popularity and success before something better comes).

TrueMotion VP4 was mostly the same but with different coding method for some data types. Maybe it was the first codec to move edge loop filtering from being performed on the frame to being performed on temporary block used in motion compensation but I’m not entirely sure.

TrueCast VP5 was the first in the series to employ their own version of static binary arithmetic coder mostly known as bool coder. That means that instead of updating bit probability after each decoding using that context as CABAC does, frame header encodes fixed probabilities (or just updates from the probabilities in the previous frame) and uses them for decoding.

VP6. Probably the most famous of them all since it was used in Flash videos. From technical point of view it’s just small improvement in some details over VP5. I suspect this was the first codec in the series that introduced selecting random frame as the next golden frame (previously it was just last intra frame, now any inter frame can signal that it should become golden).

VP7. This is the first installation in the series that was based on H.264 ideas like 4×4 transform and spatial prediction.

And of course there’s AVS, an audio codec inspired by AAC LC that accompanied some VP5-VP7 videos.

The Age of Guardian codecs

While the design direction has not changed much, the codecs themselves mostly belong to the niche provided by their current owner and hardly used anywhere else. For now we have VP8, VP9 and VP10 (aka AV1).


I hope there will be more to write about those after I write decoders for the rest of them and learn the shameful details of their design in the process.

NihAV: split and games

February 6th, 2019

I have not written anything about NihAV for the last two months. If you thought it’s because nothing was done you’d be mostly correct (but not in thinking about the project at all—this is pointless). And yet there has been some progress recently (but before that I spent a whole month not doing anything NihAV-related).

First thing, I’ve finally split NihAV into smaller crates. This was done in order to reduce compilation and testing time for separate components. It started to feel a lot like FFmpeg and Libav times except that I don’t have to rerun configure and recompile everything in case I want to add a new decoder. Now codecs reside in standalone crates that contain just relevant decoders and demuxers so “write code—fail to compile—fix—run tests” cycle goes faster now.

And here’s the new structure:

  • nihav-core — the core. It contains the definitions for basic stuff like packets and video/audio buffers, I/O routines (byte- and bitreader), DSP and common video stuff for H.263-based decoders;
  • nihav-commonfmt — for various common formats. Currently it has all leftover codecs I could not move elsewhere (and AVI demuxer). Maybe in the future it will grow too large so I’ll split out some groups like speech codecs into new crates;
  • nihav-duck — for Duck game codecs up to VP7;
  • nihav-game — for various game codecs and containers;
  • nihav-indeo — for Indeo video and audio codecs;
  • nihav-rad — for totally RAD formats like Smacker and Bink;
  • nihav-realmedia — full RealMedia support (except for common decoders in nihav-commonfmt);
  • and finally nihav-allstuff — the crate that binds them all so you don’t have to search for decoders in separate crates.

Of course there’s nihav-tool to perform the actual decoding and test whether the code works but it’s always been a standalone crate.

This split required some changes in the decoder and demuxer handling. Previously I could get away with one table where all decoders (or demuxers) were registered and then search for an appropriate decoder there. Now it is not possible since there are many crates with individual codecs enabled or disabled in configuration. In result I had to introduce a structure called RegisteredDecoders and have crate-specific function for registering all decoders featured in crate in it. And nihav_allstuff::nihav_register_all_codecs() simply invokes them all for convenience. It’s exactly the same with demuxers too.

N.B.: I should probably write a separate post on overall NihAV design, missing parts and excuses for certain decisions.

Second, as you could spot from the crate names above, instead of trying to make NihAV do something useful (you have rust-av for that though) I decided to focus more on supporting various game formats.

So far I want to support the following formats:

  1. Sierra VMD (developed by Coktel Vision but I care more for Sierra games). Should be an easy one-shot;
  2. Discworld II and Noir BMV. I have added support for DW2 BMV format already (and it took a day to reimplement). DW3 BMV format is almost the same, DW3 audio is known but video codec still requires some reverse engineering. Also fun fact—DW2 BMV video decoder is based on reverse engineered decoder from ScummVM where they did not understand how decoding lengths works and the code still looks like a beautified assembly. For libavcodec decoder I tried my best to simplify the code and give better names to the variables. And when implementing the decoder for nihav-game I found out that “read byte, advance pointer, output nibble and save another one for later” leads to the same results but with a much cleaner code;
  3. RAD codecs. I’ve reimplemented Smacker and Bink formats support (the worst part was NIH-y implementations for DCT-II/DCT-III used in Bink audio decoder but that’s a tale for another time). Maybe now I’ll finally write a decoder for Bink2 video;
  4. Duck codecs. Yes, they are game formats that also were tried as general codecs but with lesser success. There are many games using TrueMotion 1, there are many games using TrueMotion 2, there are some games using TrueMotion VP3 and VP6 in their cutscenes. And the only their codec that got widespread is VP6. I want to implement all the family of the codecs that Duck produced before dying ascending to the higher plane of existence. While most of them are supported there are still missing features like sprites in TrueMotion 1, interlaced mode in VP6 or TM2X entirely. It requires some REing of course but that’s the appeal.

After that it would be nice to work on actual player for the supported formats, which opens a new can of worms like proper frame reordering and format conversion, but I guess I’ll have to deal with that one day.

Anyway, back to doing nothing.

Dingo Pictures

February 3rd, 2019

In order to celebrate the fact all important Dingo Pictures works were found (there might be more though) I decided to visit the birthplace of those masterpieces. You can find their address at the website or at Baidu Maps (with some reviews, mostly in English and Swedish).
Read the rest of this entry »

Exploring the Solar Systems on Earth

January 19th, 2019

So this year for me started in Sweden as usual and since I had nothing better to do on 1st of January I decided to visit Örebro. And there I got reminded that there is such thing as Solar system model on real landscape. So today I want to talk about those a bit.
Read the rest of this entry »

Dingo Pictures: more titles found!

December 16th, 2018

Somebody has managed to found Dingo Pictures media conceptual work Perseus in German and some of their video stories were discovered too—Siegfried and some Christmas story with a subtitle Max’s Wonderful Present (somebody even notified me of this one earlier). All of them are available at the usual video hosting.

I still want to see Humpie the Whale Explores the World one day though.

NihAV: now with full RealMedia support!

December 15th, 2018

In late September 2017 I’ve started to work on RealMedia support in NihAV with an intent to have full support for RealMedia. So more than a year later I’ve reached that goal.

In order to do that I had to reverse engineer one and a half codecs and one format. Here’s the full list.

Supported formats:

  • RealAudio (i.e. just single audio stream);
  • plain RealMedia (i.e. just a bunch of audio and video streams);
  • RealMedia with multiple data chunks (i.e. one or several streams are stored in separate chunk, it’s nothing extraordinary but still needs to be accounted for);
  • RealMedia multiple stream variants (i.e. single logical stream is represented by several substreams and you have to select one based on quality);
  • IVR, their own recording format (I had to RE this one).

Supported audio codecs:

  • RealAudio 1 aka 14.4;
  • RealAudio 2 aka 28.8;
  • RealAudio (AC)3 aka DNET;
  • RealAudio 4/5 (licensed from Sipro);
  • RealAudio G2 (cook);
  • RealAudio ATRAC3;
  • RealAudio AAC-LC (no SBR);
  • RealAudio Lossless.

And video codecs:

  • RealVideo 1;
  • RealVideo Fractal aka ClearVideo (I had to finish REing P-frame format for that one);
  • RealVideo 2;
  • RealVideo 3;
  • RealVideo 4;
  • RealVideo 6 or HD (I had to RE this one and now it decodes the sample I have with only minor glitches).

And here are some words about IVR that I had to RE this week.

Update: it turns out Paul had reverse engineered the format before NihAV came to existence but his implementation is even sketchier than mine unfortunately.

There are actually two formats there, one with magic .REC that contains actual recording and another one with magic .R1M that may contain several of those .rec embedded. Both formats internally reminded me more of Flash than RealMedia because both files are composed of records that can be distinguished by the first byte (yes, I still remember RTMP and how I had to parse it). R1M has two kinds of records: 0x01—recording metadata it seems, 0x02 contains actual REC.

REC files (or sub-entries in R1M) have defined amount of global properties followed by stream specific properties followed by (optional) stream seek tables followed by actual packets. All numbers are big-endian and 32-bit (seek table offsets seem to be 64-bit). Strings are coded as string length (including terminating zero) followed by string data and zero terminator.

REC record types:

  1. stream properties start, has a number of properties coded after it;
  2. packet header, more about it below;
  3. key-number pair, has key value (string), a number property length (always 4) and actual number value;
  4. binary data, has key value (string), binary data length and actual data;
  5. key-value pair with both key and value being strings;
  6. end of header record with three numbers, first of which gives an absolute (from the beginning of REC data if embedded) offset for the seek tables or packets;
  7. packet data end, always followed by eight zeroes;
  8. packet data start, always followed by eight zeroes. This record seems to be present only when seek tables are present (to detect the end of those?), otherwise packets follow end-of-header record immediately.

There may be several RJMx chunks at the end of IVR with additional metadata but they posed no interest to me.

I had some trouble with IVR packets since I expected them to be exactly the same as in RM but it turned out to be the same payload format but with different header:

  • 32-bit timestamp;
  • 16-bit stream number;
  • 32 bits of flags. I suspect this might code packet group for MLTI substreams, keyframe information and such but I could not find a proper pattern valid for all three samples (and demuxer works fine without it too);
  • 32-bit payload length;
  • 32-bit header checksum (most likely). I was not able to understand how it works but header checksum seems to be the most plausible explanation.

I am fully aware that my current implementation has bugs and flaws and might not decode all files perfectly but it decodes all kinds of files and that’s good for me. Also what to expect from software written by one lazy guy in his free time for himself?

Next is probably Duck type of codecs or totally RAD ones. Or maybe I’ll waste time on making NihAV conform to Rust 2018 edition. This seems to be a task about half as hard as porting code from K&R C to ANSI C (from a quick glance you have to change at least imports from inside the crate, traits now require word dyn and there may be more). Or it may be NAScale for all I care (and I don’t care at all). The time will show.

Why I am sceptical about AV1

December 7th, 2018

I wanted to write this post for a while but I guess AV1.1 is that cherry on top of the whole mess called AV1 that made me finally do this.

Since the time I first heard about AV1 I tried to follow its development as much as it’s possible for a person not subscribed to their main mailing list. And unfortunately while we all expected great new codec with cool ideas we got VP10 instead (yes, I still believe that AV1 is short for “A Vp 1(0) codec”). Below I try to elaborate my view and present what facts I know that formed my opinion.

A promising beginning

It all started with ITU H.EVC and its licensing—or rather its inability to be licensed. In case you forgot the situation here’s a summary: there are at least three licensing entities that claim to have patents on HEVC that you need to license in order to be using HEVC legally. Plus the licensing terms are much greedier than what we had for H.264 to the point where some licensing pool wanted to charge fees per streaming IIRC.

So it was natural that major companies operating video in Internet wanted to stay out of this and use some common license-free codec. Resorting to creating one if the need arises.

That was a noble goal that only HEVC patent holders may object to, so the Alliance for Open Media (or AOM for short) was formed. I am not sure about the details but IIRC only organisations could join and they had to pay entrance fee (or be sponsored—IIRC VideoLAN got sponsored by Mozilla) and the development process was coordinated via members-only mailing list (since I’m not a member I cannot say what actually happened there or how and have to rely on second- or third-hand information). And that is the first thing that I did not like—the process not being open enough. I understand that they might not wanted some ideas leaked out to the competitors but even people who were present on that list claim some decisions were questionable at best.

Call for features

In the beginning there were three outlooks on how it will go:

  • AV1 will be a great new codec that will select only the best ideas from all participants (a la MPEG but without their political decisions) and thus it will put H.266 to shame—that’s what optimists thought;
  • AV1 will be a great new codec that will select only the best ideas and since all of those ideas come from Xiph it will be simply Daala under new name—that’s what cultists thought;
  • Baidu wants to push its VP10 on others but since VP8 and VP9 had very limited success it will create an illusion of participation so other companies will feel they’ve contributed something and spread it out (also maybe it will be used mostly for bargaining better licensing terms for some H.26x codecs)—that’s what I thought.

And looks like all those opinions were wrong. AV1 is not that great especially considering its complexity (we’ll talk about it later); its features were not always selected based on the merit (so most of Daala stuff was thrown out in the end—but more about it later); and looks like the main goal was to interest hardware manufacturers in its acceptance (again, more on it later).

Anyway, let’s look what main feature proposals were (again, I could not follow it so maybe there was more):

  • Baidu libvpx with current development snapshot of VP10;
  • Baidu alternative approach to VP10 using Asymmetric Numeric Systems coding;
  • Cisco’s simplified version of ITU H.EVC aka Thor codec (that looks more like RealVideo 6 in result) with some in-house developed filters that improve compression;
  • Mozilla’s supported Daala ideas from Xiph.

But it started with a scandal since Baidu tried to patent ANS-based video codec (essentially just an idea of video codec that uses ANS coding) after accepting ANS inventor’s help and ignoring his existence or wishes afterwards.

And of course they had to use libvpx as the base because. Just because.

Winnowing

So after the initial gathering of ideas it was time to put them all to test to see which ones to select and which ones to reject.

Of course since organisations are not that happy with trying something radically new, AV1 was built on the existing ideas with three main areas where new ideas were compared:

  1. prediction (intra or inter);
  2. coefficient coding;
  3. transform.

I don’t think there were attempts to change the overall codec structure. To clarify: ITU ITU H.263 used 8×8 DCT and intra prediction consisted of copying top row or left column of coefficients from the reference block, ITU H.264 used 4×4 integer transform and tried to fill block from its neighbours already reconstructed pixel data, ITU H.265 used variable size integer transform (from 4×4 to 32×32), different block scans and quadree coding of the blocks. On the other hand moving from VP9 to AV1 did not involve such significant changes.

So, for prediction there was one radically new thing: combining Thor and Daala filter into single constrained directional enhancement filter (or CDEF for short). It works great, it gives image quality boost at small cost. And another interesting tool is predicting chroma from luma (or CfL for short) that was a rejected idea for ITU H.EVC but later was tried both in Thor and Daala and found good enough (the history is mentioned in the paper describing it). This makes me think that if Cisco joined efforts with Xiph foundation they’d be able to produce a good and free video codec without any other company. Getting it accepted by others though…

Now coefficient coding. There were four approaches initially:

  • VP5 bool coding (i.e. binary coding of bits with fixed probability that gets updated once per frame; it appeared in On2 VP5 and survived all the way until VP10);
  • ANS-based coding;
  • Daala classic range coder;
  • Thor variable-based codes (probably not even officially proposed since it is significantly less effective than any other proposed scheme).

ANS-based coding was rejected probably because of the scandal and that it requires data to be coded in reverse direction (the official reasoning is that while it was faster on normal CPU it was slower in some hardware implementations—that is a common reason for rejecting a feature in AV1).

Daala approach won, probably because it’s easier to manipulate a multi-symbol model than try to code everything as context-dependent binarisation of the value (and you’ll need to store and/or code a lot of context probabilities that way). In any case it was clear winner.

Now, transforms. Again, I cannot tell how it went exactly but all stories I heard were that Daala transforms were better but then Baidu had to intervene citing hardware implementation reasons (something in the lines that it’s hard to implement new transforms and why do that since we have working transforms for VP9 with tried hardware design) so VP9 transforms had been chosen in the end.

The final stage

In April 2018 AOM has announced long-awaited bitstream freeze which came as a surprise to the developers.

The final final stage

In June it was actually frozen and AV1.0 was released along with the spec. Fun fact: the repository for av1-spec on baidusource.com that once hosted it (there are even snapshots of it from June in the Web Archive) now is completely empty.

And of course because of some hardware implementation difficulties (sounds familiar yet?) now we have AV1.1 which is not fully compatible with AV1.0.

General impressions

This all started as a good intent but in the process of developing AV1.x it raised so many flags that I feel suspicious about it:

  • ANS patent;
  • Political games like A**le joining AOM as “founding member” when the codec was almost ready;
  • Marketing games like announcing frozen bitstream before large exhibition while in reality it reached 1.0 status later and without many fanfares;
  • Not very open development process: while individual participants could publish their achievements and it was not all particularly secret, it was more “IBM open” in the sense it’s open if you’re registered at their portal and signed some papers but not accessible to any passer-by;
  • Not very open decision process: hardware implementation was very often quoted as the excuse, even in issues like this;
  • Not very good result (and that’s putting it mildly);
  • Oh, and not very good ecosystem at all. There are test bitstreams but even individual members of AOM have to buy them.

And by “not very good result” I mean that the codec is monstrous in size (tables alone take more than megabyte in source form and there’s even more code than tables) and its implementation is slow as pitch drop experiment.

Usually people trying to defend it say the same two arguments: “but it’s just a reference model, look at JM or HM” and “codecs are not inherently complex, you can write a fast encoder”. Both of those are bullshit.

First, comparing libaom to the reference software of H.264 or H.265. While formally it’s also the reference software there’s one huge difference. JM/HM were the plain C/C++ implementations with no optimisation tricks (beside transform speed-up by decomposition in HM) while libaom has all kinds optimisations including SIMD for ARM, POWER and x86. And dav1d decoder with rather full set of AVX optimisations is just 2-3 times faster (more when it can use threading). For H.264 optimised decoders were tens of times faster than JM. I expect similar range for HM too but two-three times faster is very bad result for unoptimised reference (which libaom is not).

Second, claiming that codecs are not inherently complex and thus you can write fast encoder even is the codec is more complex than its predecessor. Well, it is partly true in the sense that you’re not forced to use all possible features and thus can avoid some of combinatorial explosion by not trying some coding tools. Well, there is certain expectation built in into any codec design (i.e. that you use certain coding tools in certain sequence omitting them only in certain corner cases) and there are certain expectations on compression level/quality/speed.

For example, let’s get to the basics and make H.EVC encoder encode raw video. Since you’re not doing intra prediction, motion compensation or transforms it’s probably the fastest encoder you can get. But in order to do that you still have to code coding quadtrees and transmit flags that it has PCM data. In result your encoder will beat any other on speed but it will still lose to memcpy() because it does not have to invoke arithmetic coder for mandatory flags for every coded block (which also take space along with padding to byte boundary, so it loses in compression efficiency too). That’s not counting the fact that such encoder is useless for any practical purpose.

Now let’s consider some audio codecs—some of them use parametric bit allocation in both encoder and decoder (video codecs might start to use the same technique one day, Daala has tried something like that already) so such codec needs to run it regardless on how you try to compute better allocation—you have to code it as a difference to the implicitly calculated one. And of course such codec is more complex than the one that transmits bit allocation explicitly for each sample or sample group. But it gains in compression efficiency and that’s the whole point of having more complex codec in the first place.

Hence I cannot expect of AV1 decoders magically being ten times faster than libaom and similarly while I expect that AV1 encoders will become much faster they’ll still either measure encoding speed in frames per month minute or be on par with x265 in terms on compression efficiency/speed (and x265 is also not the best possible H.265 encoder in my opinion).


Late Sir Terence Pratchett (this world is truly sadder place without his presence) used a phrase “ladies of negotiable hospitality” to describe certain profession in Discworld. And to me it looks like AV1 is a codec of negotiated design. In other words, first they tried to design the usual general purpose codec but then (probably after seeing how well it performs) they decided to bet on hardware manufacturers (who else would make AV1 encoders and more importantly decoders perform fast enough especially for mobile devices?). And that resulted in catering to all possible objections any hardware manufacturer of the alliance had (to the point of AV1.1).

This is the only way I can reasonably explain what I observe with AV1. If somebody has a different interpretation, especially based on facts I don’t know or missed, I’d like to hear it and know the truth. Meanwhile, I hope I made my position clear.