Dingo Pictures: more titles found!

December 16th, 2018

Somebody has managed to found Dingo Pictures media conceptual work Perseus in German and some of their video stories were discovered too—Siegfried and some Christmas story with a subtitle Max’s Wonderful Present (somebody even notified me of this one earlier). All of them are available at the usual video hosting.

I still want to see Humpie the Whale Explores the World one day though.

NihAV: now with full RealMedia support!

December 15th, 2018

In late September 2017 I’ve started to work on RealMedia support in NihAV with an intent to have full support for RealMedia. So more than a year later I’ve reached that goal.

In order to do that I had to reverse engineer one and a half codecs and one format. Here’s the full list.

Supported formats:

  • RealAudio (i.e. just single audio stream);
  • plain RealMedia (i.e. just a bunch of audio and video streams);
  • RealMedia with multiple data chunks (i.e. one or several streams are stored in separate chunk, it’s nothing extraordinary but still needs to be accounted for);
  • RealMedia multiple stream variants (i.e. single logical stream is represented by several substreams and you have to select one based on quality);
  • IVR, their own recording format (I had to RE this one).

Supported audio codecs:

  • RealAudio 1 aka 14.4;
  • RealAudio 2 aka 28.8;
  • RealAudio (AC)3 aka DNET;
  • RealAudio 4/5 (licensed from Sipro);
  • RealAudio G2 (cook);
  • RealAudio ATRAC3;
  • RealAudio AAC-LC (no SBR);
  • RealAudio Lossless.

And video codecs:

  • RealVideo 1;
  • RealVideo Fractal aka ClearVideo (I had to finish REing P-frame format for that one);
  • RealVideo 2;
  • RealVideo 3;
  • RealVideo 4;
  • RealVideo 6 or HD (I had to RE this one and now it decodes the sample I have with only minor glitches).

And here are some words about IVR that I had to RE this week.

Update: it turns out Paul had reverse engineered the format before NihAV came to existence but his implementation is even sketchier than mine unfortunately.

There are actually two formats there, one with magic .REC that contains actual recording and another one with magic .R1M that may contain several of those .rec embedded. Both formats internally reminded me more of Flash than RealMedia because both files are composed of records that can be distinguished by the first byte (yes, I still remember RTMP and how I had to parse it). R1M has two kinds of records: 0x01—recording metadata it seems, 0x02 contains actual REC.

REC files (or sub-entries in R1M) have defined amount of global properties followed by stream specific properties followed by (optional) stream seek tables followed by actual packets. All numbers are big-endian and 32-bit (seek table offsets seem to be 64-bit). Strings are coded as string length (including terminating zero) followed by string data and zero terminator.

REC record types:

  1. stream properties start, has a number of properties coded after it;
  2. packet header, more about it below;
  3. key-number pair, has key value (string), a number property length (always 4) and actual number value;
  4. binary data, has key value (string), binary data length and actual data;
  5. key-value pair with both key and value being strings;
  6. end of header record with three numbers, first of which gives an absolute (from the beginning of REC data if embedded) offset for the seek tables or packets;
  7. packet data end, always followed by eight zeroes;
  8. packet data start, always followed by eight zeroes. This record seems to be present only when seek tables are present (to detect the end of those?), otherwise packets follow end-of-header record immediately.

There may be several RJMx chunks at the end of IVR with additional metadata but they posed no interest to me.

I had some trouble with IVR packets since I expected them to be exactly the same as in RM but it turned out to be the same payload format but with different header:

  • 32-bit timestamp;
  • 16-bit stream number;
  • 32 bits of flags. I suspect this might code packet group for MLTI substreams, keyframe information and such but I could not find a proper pattern valid for all three samples (and demuxer works fine without it too);
  • 32-bit payload length;
  • 32-bit header checksum (most likely). I was not able to understand how it works but header checksum seems to be the most plausible explanation.

I am fully aware that my current implementation has bugs and flaws and might not decode all files perfectly but it decodes all kinds of files and that’s good for me. Also what to expect from software written by one lazy guy in his free time for himself?

Next is probably Duck type of codecs or totally RAD ones. Or maybe I’ll waste time on making NihAV conform to Rust 2018 edition. This seems to be a task about half as hard as porting code from K&R C to ANSI C (from a quick glance you have to change at least imports from inside the crate, traits now require word dyn and there may be more). Or it may be NAScale for all I care (and I don’t care at all). The time will show.

Why I am sceptical about AV1

December 7th, 2018

I wanted to write this post for a while but I guess AV1.1 is that cherry on top of the whole mess called AV1 that made me finally do this.

Since the time I first heard about AV1 I tried to follow its development as much as it’s possible for a person not subscribed to their main mailing list. And unfortunately while we all expected great new codec with cool ideas we got VP10 instead (yes, I still believe that AV1 is short for “A Vp 1(0) codec”). Below I try to elaborate my view and present what facts I know that formed my opinion.

A promising beginning

It all started with ITU H.EVC and its licensing—or rather its inability to be licensed. In case you forgot the situation here’s a summary: there are at least three licensing entities that claim to have patents on HEVC that you need to license in order to be using HEVC legally. Plus the licensing terms are much greedier than what we had for H.264 to the point where some licensing pool wanted to charge fees per streaming IIRC.

So it was natural that major companies operating video in Internet wanted to stay out of this and use some common license-free codec. Resorting to creating one if the need arises.

That was a noble goal that only HEVC patent holders may object to, so the Alliance for Open Media (or AOM for short) was formed. I am not sure about the details but IIRC only organisations could join and they had to pay entrance fee (or be sponsored—IIRC VideoLAN got sponsored by Mozilla) and the development process was coordinated via members-only mailing list (since I’m not a member I cannot say what actually happened there or how and have to rely on second- or third-hand information). And that is the first thing that I did not like—the process not being open enough. I understand that they might not wanted some ideas leaked out to the competitors but even people who were present on that list claim some decisions were questionable at best.

Call for features

In the beginning there were three outlooks on how it will go:

  • AV1 will be a great new codec that will select only the best ideas from all participants (a la MPEG but without their political decisions) and thus it will put H.266 to shame—that’s what optimists thought;
  • AV1 will be a great new codec that will select only the best ideas and since all of those ideas come from Xiph it will be simply Daala under new name—that’s what cultists thought;
  • Baidu wants to push its VP10 on others but since VP8 and VP9 had very limited success it will create an illusion of participation so other companies will feel they’ve contributed something and spread it out (also maybe it will be used mostly for bargaining better licensing terms for some H.26x codecs)—that’s what I thought.

And looks like all those opinions were wrong. AV1 is not that great especially considering its complexity (we’ll talk about it later); its features were not always selected based on the merit (so most of Daala stuff was thrown out in the end—but more about it later); and looks like the main goal was to interest hardware manufacturers in its acceptance (again, more on it later).

Anyway, let’s look what main feature proposals were (again, I could not follow it so maybe there was more):

  • Baidu libvpx with current development snapshot of VP10;
  • Baidu alternative approach to VP10 using Asymmetric Numeric Systems coding;
  • Cisco’s simplified version of ITU H.EVC aka Thor codec (that looks more like RealVideo 6 in result) with some in-house developed filters that improve compression;
  • Mozilla’s supported Daala ideas from Xiph.

But it started with a scandal since Baidu tried to patent ANS-based video codec (essentially just an idea of video codec that uses ANS coding) after accepting ANS inventor’s help and ignoring his existence or wishes afterwards.

And of course they had to use libvpx as the base because. Just because.

Winnowing

So after the initial gathering of ideas it was time to put them all to test to see which ones to select and which ones to reject.

Of course since organisations are not that happy with trying something radically new, AV1 was built on the existing ideas with three main areas where new ideas were compared:

  1. prediction (intra or inter);
  2. coefficient coding;
  3. transform.

I don’t think there were attempts to change the overall codec structure. To clarify: ITU ITU H.263 used 8×8 DCT and intra prediction consisted of copying top row or left column of coefficients from the reference block, ITU H.264 used 4×4 integer transform and tried to fill block from its neighbours already reconstructed pixel data, ITU H.265 used variable size integer transform (from 4×4 to 32×32), different block scans and quadree coding of the blocks. On the other hand moving from VP9 to AV1 did not involve such significant changes.

So, for prediction there was one radically new thing: combining Thor and Daala filter into single constrained directional enhancement filter (or CDEF for short). It works great, it gives image quality boost at small cost. And another interesting tool is predicting chroma from luma (or CfL for short) that was a rejected idea for ITU H.EVC but later was tried both in Thor and Daala and found good enough (the history is mentioned in the paper describing it). This makes me think that if Cisco joined efforts with Xiph foundation they’d be able to produce a good and free video codec without any other company. Getting it accepted by others though…

Now coefficient coding. There were four approaches initially:

  • VP5 bool coding (i.e. binary coding of bits with fixed probability that gets updated once per frame; it appeared in On2 VP5 and survived all the way until VP10);
  • ANS-based coding;
  • Daala classic range coder;
  • Thor variable-based codes (probably not even officially proposed since it is significantly less effective than any other proposed scheme).

ANS-based coding was rejected probably because of the scandal and that it requires data to be coded in reverse direction (the official reasoning is that while it was faster on normal CPU it was slower in some hardware implementations—that is a common reason for rejecting a feature in AV1).

Daala approach won, probably because it’s easier to manipulate a multi-symbol model than try to code everything as context-dependent binarisation of the value (and you’ll need to store and/or code a lot of context probabilities that way). In any case it was clear winner.

Now, transforms. Again, I cannot tell how it went exactly but all stories I heard were that Daala transforms were better but then Baidu had to intervene citing hardware implementation reasons (something in the lines that it’s hard to implement new transforms and why do that since we have working transforms for VP9 with tried hardware design) so VP9 transforms had been chosen in the end.

The final stage

In April 2018 AOM has announced long-awaited bitstream freeze which came as a surprise to the developers.

The final final stage

In June it was actually frozen and AV1.0 was released along with the spec. Fun fact: the repository for av1-spec on baidusource.com that once hosted it (there are even snapshots of it from June in the Web Archive) now is completely empty.

And of course because of some hardware implementation difficulties (sounds familiar yet?) now we have AV1.1 which is not fully compatible with AV1.0.

General impressions

This all started as a good intent but in the process of developing AV1.x it raised so many flags that I feel suspicious about it:

  • ANS patent;
  • Political games like A**le joining AOM as “founding member” when the codec was almost ready;
  • Marketing games like announcing frozen bitstream before large exhibition while in reality it reached 1.0 status later and without many fanfares;
  • Not very open development process: while individual participants could publish their achievements and it was not all particularly secret, it was more “IBM open” in the sense it’s open if you’re registered at their portal and signed some papers but not accessible to any passer-by;
  • Not very open decision process: hardware implementation was very often quoted as the excuse, even in issues like this;
  • Not very good result (and that’s putting it mildly);
  • Oh, and not very good ecosystem at all. There are test bitstreams but even individual members of AOM have to buy them.

And by “not very good result” I mean that the codec is monstrous in size (tables alone take more than megabyte in source form and there’s even more code than tables) and its implementation is slow as pitch drop experiment.

Usually people trying to defend it say the same two arguments: “but it’s just a reference model, look at JM or HM” and “codecs are not inherently complex, you can write a fast encoder”. Both of those are bullshit.

First, comparing libaom to the reference software of H.264 or H.265. While formally it’s also the reference software there’s one huge difference. JM/HM were the plain C/C++ implementations with no optimisation tricks (beside transform speed-up by decomposition in HM) while libaom has all kinds optimisations including SIMD for ARM, POWER and x86. And dav1d decoder with rather full set of AVX optimisations is just 2-3 times faster (more when it can use threading). For H.264 optimised decoders were tens of times faster than JM. I expect similar range for HM too but two-three times faster is very bad result for unoptimised reference (which libaom is not).

Second, claiming that codecs are not inherently complex and thus you can write fast encoder even is the codec is more complex than its predecessor. Well, it is partly true in the sense that you’re not forced to use all possible features and thus can avoid some of combinatorial explosion by not trying some coding tools. Well, there is certain expectation built in into any codec design (i.e. that you use certain coding tools in certain sequence omitting them only in certain corner cases) and there are certain expectations on compression level/quality/speed.

For example, let’s get to the basics and make H.EVC encoder encode raw video. Since you’re not doing intra prediction, motion compensation or transforms it’s probably the fastest encoder you can get. But in order to do that you still have to code coding quadtrees and transmit flags that it has PCM data. In result your encoder will beat any other on speed but it will still lose to memcpy() because it does not have to invoke arithmetic coder for mandatory flags for every coded block (which also take space along with padding to byte boundary, so it loses in compression efficiency too). That’s not counting the fact that such encoder is useless for any practical purpose.

Now let’s consider some audio codecs—some of them use parametric bit allocation in both encoder and decoder (video codecs might start to use the same technique one day, Daala has tried something like that already) so such codec needs to run it regardless on how you try to compute better allocation—you have to code it as a difference to the implicitly calculated one. And of course such codec is more complex than the one that transmits bit allocation explicitly for each sample or sample group. But it gains in compression efficiency and that’s the whole point of having more complex codec in the first place.

Hence I cannot expect of AV1 decoders magically being ten times faster than libaom and similarly while I expect that AV1 encoders will become much faster they’ll still either measure encoding speed in frames per month minute or be on par with x265 in terms on compression efficiency/speed (and x265 is also not the best possible H.265 encoder in my opinion).


Late Sir Terence Pratchett (this world is truly sadder place without his presence) used a phrase “ladies of negotiable hospitality” to describe certain profession in Discworld. And to me it looks like AV1 is a codec of negotiated design. In other words, first they tried to design the usual general purpose codec but then (probably after seeing how well it performs) they decided to bet on hardware manufacturers (who else would make AV1 encoders and more importantly decoders perform fast enough especially for mobile devices?). And that resulted in catering to all possible objections any hardware manufacturer of the alliance had (to the point of AV1.1).

This is the only way I can reasonably explain what I observe with AV1. If somebody has a different interpretation, especially based on facts I don’t know or missed, I’d like to hear it and know the truth. Meanwhile, I hope I made my position clear.

NihAV: The Fastest RealVideo 6 Decoder in Rust!

November 29th, 2018

I guess the title shows how stupid marketing something can be if there’s just one contestant in the category—so in order to win you merely need to exist. Like in this case: NihAV can barely decode data and it’s not correct but the images are recognizable (some examples below) and that’s enough to justify this post title. Also I have just one sample clip to test my decoder on but at least RV6 is not a format with many features so only one feature of it is not tested (type 3 frames).

Anyway, here’s the first frame—and it is reconstructed perfectly:

The rest of frames is of significantly worse quality but with more details.
Read the rest of this entry »

RealVideo6: Technical Description

November 10th, 2018

This week I’ve started reading the binary specification for RealVideo 6 again (yes, I had a post on RV6 technical description a year ago) since now I have a goal of implementing a decoder for it. So here I’ll try to document the format with as many details as possible.

Small update: obviously I’m working on a decoder so when it’s more or less ready I’ll document it on The Wiki.
Read the rest of this entry »

RMHD: REing Has Started!

November 10th, 2018

So, after a long series of delays and procrastination (the common story in NihAV development), I’ve finally stared to work on adding support for RMHD to my project. There are two things to be done actually: add support for RMHD container (so that I can demux actual files) and obviously support for RealVideo 6 itself.

As for RMHD container support, this was quite easy. Some person has given me a file about a year ago so I could look at it and improve my existing RealMedia demuxer. As expected the format is about the same but a bit different: magic word in the header now is “.RMP” instead of “.RMF”, some chunks were upgraded to version 2 (previously they all were either version 0 or version 1) with some additional fields shoved in the middle (or maybe they’ve upgraded them to 64-bit). The WTFiest change is how audio header was written: binary fields tell it’s header version 5 (it is) but text identifier says “.rm4” which obviously comes from version 4 and the provided header size is too short compared to the conventional RMVB and actual header data. At last after some hacks added to my RealMedia demuxer I was able to decode AAC track from the file so only the video part remains.

For RV6 I have the binary specification with a lot of debug symbols left. Even though I’ve made a good progress on it, I expect CEmpeg to add the opensource decoder for it first. After all, if you search for strings in the librv11dec.so from the official site you’ll see the function names that will make almost anybody recognizing them cringe:

C_hScale8To15(short*, int, unsigned char const*, short const*, short const*, int)
C_yuv2planeX(short const*, int, short const**, unsigned char*, int, unsigned char const*, int)
sws_pb_64
SSE2_yuv2planeX(short const*, int, short const**, unsigned char*, int, unsigned char const*, int)
SSE2_hScale8To15(short*, int, unsigned char const*, short const*, short const*, int)
yuv2planeX_8_c_opt(short const*, int, short const**, unsigned char*, int, unsigned char const*, int, int)

In case you belong to the blessed majority who do not know what those functions are, these are part of libswscale, a library for frame colourspace conversion and scaling licensed under LGPL. Which means pieces of it were borrowed and adapted for (YUV-only) frame scaling in RV6 decoder and thus it source code should be available under LGPL. And imperfect English messages in the same specification (“not support” consistently occurs throughout it) hint that it was quite probably Chinese developers responsible for this. Since CEmpeg has both the majority of libswscale developers and people who speak Chinese, it should be no problem for them to obtain the codes. Meanwhile I’ll continue my work because NIH in NihAV is there for a reason (that also reminds me I should port NAScale to Rust eventually).

Anyway, the actual description of RealVideo 6 deserves a separate post so I’ll dedicate the following post to it (but who would read that anyway?).

NihAV: Lurching Forward!

October 28th, 2018

Finally NihAV got full* support for RealAudio. Marketing asterisk there is for AAC support since only AAC LC is supported which means decoding only 22kHz from racp (as I said before, no SBR support) plus some other features were cut: it supports just multichannel audio but no coupling feature, noise codebook or e.g. LTP from AAC Main. Also for similar reasons I did not care to optimise its performance. I don’t care much about it and I don’t want to spend a lot of time on AAC. Here’s some fun statistics for you: AAC decoder is about 1800 lines long (IMDCT, window generation and bitstream parsing are common modules but the rest of AAC stuff is there), about 100 lines are spent on subband tables, 350 lines are spent on codebook tables and 250 lines are wasted on “proper” parsing of general MPEG-4 Audio descriptor with type-specific stuff like HILN or ALS configuration being simply left as unimplemented!() (a useful Rust macro that will report it and abort program if it reaches that point so you know when to start caring).

And now for the fun Rust part.

My codebook generator takes something that implements CodebookDescReader trait which is essentially a common interface that specifies how to get information for each codebook element (codeword, its length and associated symbol). While I had some standard implementations, they were not very useful since the most common case is when you have two tables of different types (8 bits are enough for codeword lengths and actual codewords may take 8-32 bits) and you want to use some generic implementation instead of writing the same code again and again.

In theory it should be easy: you create generic structure that stores pointers to those tables and converts them into needed types. It would work after some fiddling with type constraints but the problem is to make the return type what you want i.e. the function takes entry index number (as usize) and returns some more sensible type (so that when I read code later more appropriate type will be used).

And that is rather hard problem because generic return type can’t be converted from usize easily. I asked Luca of rust-av fame (he famously does not work on it) and he provided me with two and a half solutions that won’t work in NihAV:

  • Use unstable TryInto trait that can try to convert input type into something smaller—rejected since it’s unstable;
  • Use num_traits crate for conversion traits that work with stable—rejected since it’s an external crate;
  • Implement such trait yourself. Hmm…

So I NIHed it by taking a function that would map input index value to output type. This way is both simpler and more flexible. For example, AAC scalefactor codebook has values in the range -60..+60 so instead of writing new structure like I did before I can simply do

fn scale_map(idx: usize) -> i8 { (idx as i8) - 60 }
...
let mut coderead = TableCodebookDescReader::new(AAC_SCF_CODEBOOK_CODES, AC_SCF_CODEBOOK_BITS, scale_map);

And this works as expected.

There’s a small fun thing related to it. By old C habit I wrote initially &fn as argument type and got cryptic Rust compiler error about mismatched types:

    = note: expected type `&fn(usize) -> i8`
               found type `&fn(usize) -> i8 {codecs::aac::scale_map}`

Again, my fault but rustc output is a bit WTFy.


Anyway, the next is supposedly RealVideo HD or RealVideo 6 decoder and then I can forget about this codec family and move to something different like DuckMotion or Smacker/Bink.

And in the very unlikely case you were wondering why I’m so slow I can tell you why: I am lazy (big reveal, I know), I prefer to spend my time differently (on workdays I work, on Sunday I travel around, that leaves only couple of hours during workdays, some part of Saturday and a bit of Sunday if I’m lucky) and I work better when the stuff is interesting which is not always the case. Especially if you consider that there is no multimedia framework in Rust (yet?) so I have to NIH every small bit and some of those bits I don’t like much. So don’t expect any news from me soon.

RealAudio Cook aka RealOpus

October 14th, 2018

Let’s start with a bit of history since knowing how things developed often helps to understand how they ended up like they are.

There is an organisation previously called CCITT (phone line modem owners might remember it) later renamed to ITU-T. It is known for standardisation and accepting various standards under the same confusing name e.g. PCM and A-law/mu-law quantisation are G.711 recommendation from 1972 while G.711.0 is lossless audio compression scheme from 2009 and G.711.1 is a weird extension from 2008 that splits audio into two bands, compresses low band with A- or mu-law and uses MDCT and vector quantisation on top band.

And there is also a “family” of G.722 speech codecs: basic G.722 that employs splitting audio into subbands and applying ADPCM on them; G.722.1 is a completely different parametric bit allocation, VQ and MDCT codec we discuss later; G.722.2 is a traditional speech codec better known as AMR-WB.

So, what’s the deal with G.722.1? It comes from PictureTel family of Siren codecs (which later served as a base for G.719 too). Also as I mentioned before this codec employs MDCT, vector quantisation and parametric bit allocation. So you decode envelope defined by quantisers, allocate bits to bands depending on those (no, it’s not 1:1 mapping), unpack bands that are coded using vector quantisation dependent on amount of bits and perform MDCT on them. You might be not familiar but this is exactly how certain RealAudio codec works. And I don’t think you can guess its name even if I mention that it was written by Ken Cooke. But you cannot say nothing was changed: RealAudio codec works with different frame sizes (from 32 to 1024 IIRC), it has different codebooks, it has joint stereo mode and finally it has multichannel coding mode based on pairs. In other words, it has evolved from niche speech codec to general purpose audio codec rivalling AAC and it was indeed a codec of choice for RealMedia before they have finally switched to AAC and HE-AAC years later (which was the first time for them using open standard verbatim instead of licensing a proprietary technology or adding their own touches on standards drafts as before—even DNET had special low-bitrate mode).

Now let’s jump to 2012 and VideoLAN Dev Days ’12. I gave a talk there about reverse engineering codecs (of course) and it was a complete failure so that was my first and last public talk but that’s not important. And before me Timothy Terriberry gave an overview of Opus. So I listen how it combines speech and general audio codec (like USAC which you might still not know under its commercial name xHE-AAC)—boring, how speech codec works (it’s Skype SILK codec they dumped to open source at some point and like with Duck TrueMotion VP3 before, Xiph has picked it up and adopted for own purposes)—looks like typical speech codec that I can barely understand how it functions, and then CELT part comes up. CELT is general audio codec developed by Xiph that is essentially what your Opus files will end as (SILK is used only at extremely low bitrates in files produced by the reference encoder—or so I heard from the person implementing a decoder for it). And a couple of months before VDD12 I actually bothered to enter technical details about Cook into MultimediaWiki (here’s edit history if you want to check that)—I should probably RE some codec and write more pages there for the old times’ sake. So Cook design details were still fresh in my mind when I heard about CELT details…

So CELT codes just single channels or stereo pairs—nothing unusual so far, many codecs do that. It also uses MDCT—even more codecs do that. It codes envelope, uses parametric bit allocation and vector quantisation—wait a bit, I definitely heard about this somewhere before (yes, it sounds suspiciously like ITU G.719). Actually I pointed out that to Xiph guys (there was Monty present as well) immediately but it was dismissed as being nothing similar at all (“we transmit band energies instead of relying on quantisers”—right, and quantisers in audio are rarely chosen depending on energy).

Let’s compare the coding stages of two codecs to see how they fail to match up:

  1. CELT transmits band energy—Cook transmits quantisers (that are still highly correlated with band energy) and variable amount of gains to shape output frame in time domain;
  2. CELT transmits innovation (essentially coefficients for MDCT minus some predicted stuff)—Cook transmits MDCT coefficients;
  3. CELT uses transmitted band energy and bits available for innovation after the rest of frame is coded to determine number of bits for each band and mode in which coefficients are coded (aka parametric bit allocation)—Cook uses transmitted quantisers and bits available after the rest of frame is coded to determine number of bits for each band and mode in which coefficients are coded;
  4. CELT uses Perceptual Vector Quantization (based on Pyramid Vector Quantizer—boy, the won’t cause any confusion at all)—Cook uses fixed vector quantisation based on amount of bits allocated to band and static codebook;
  5. CELT estimates pitch gains and pitch period—that is a speech codec stuff that Cook does not have;
  6. CELT uses MDCT to restore the data—Cook does the same.

Some of you might say: “Hah! Even if it matches at some stages actual coefficient coding is completely different!! And you forgot that CELT uses range coder too.” Well, I didn’t say those two formats were exactly the same, just that their design is very similar. To quote the immortal words from Bell, Cleary and Witten paper on text compression, the progress in data compression is mostly defined by larger amounts of RAM available (and CPU cycles available). So back in the day hardly any audio codec could afford range coder (invented in 1979) except for some slow lossless audio coders. Similarly PVQ was proposed by Thomas Fischer in 1986 but wasn’t employed because it was significantly costlier than some fixed codebook vector quantisation. So while CELT is undeniably more advanced than Cook, the main gains are from using methods that do the same thing more effectively (at expense of RAM and/or CPU) instead of coming up with significantly different scheme. An obligatory car analogy: claiming that modern internal combustion engine car is completely new invention compared to Ford Model T or FIAT 124 because they have more bells and whistleselectronics even while principal scheme remains the same—while radically new car would be an electric one with no transmission or gearbox and engines in each wheel (let’s forget such scheme is very old too—electric cars of such design roamed Moon in 1970s).

So overall, Opus is almost synonymous with CELT and CELT has a lot of common in design with Cook (but greatly improved) so this allows Cook to be called RealOpus or Opus of its era.


BTW when implementing the decoder for this format in Rust I’ve encountered a problem: the table for 6-bit stereo coupling was never tested because its definition is wrong (some code definitions repeating with the same bit lengths) and looks like the first half of it got corrupted. Just compare for yourselves.

libavcodec version (lengths array added for the reference):

static const uint16_t ccpl_huffcodes6[63] = {
    0x0004,0x0005,0x0005,0x0006,0x0006,0x0007,0x0007,0x0007,0x0007,0x0008,0x0008,0x0008,
    0x0008,0x0009,0x0009,0x0009,0x0009,0x000a,0x000a,0x000a,0x000a,0x000a,0x000b,0x000b,
    0x000b,0x000b,0x000c,0x000d,0x000e,0x000e,0x0010,0x0000,0x000a,0x0018,0x0019,0x0036,
    0x0037,0x0074,0x0075,0x0076,0x0077,0x00f4,0x00f5,0x00f6,0x00f7,0x01f5,0x01f6,0x01f7,
    0x01f8,0x03f6,0x03f7,0x03f8,0x03f9,0x03fa,0x07fa,0x07fb,0x07fc,0x07fd,0x0ffd,0x1ffd,
    0x3ffd,0x3ffe,0xffff,
};

static const uint8_t ccpl_huffbits6[63] = {
    16,15,14,13,12,11,11,11,11,10,10,10,
    10,9,9,9,9,9,8,8,8,8,7,7,
    7,7,6,6,5,5,3,1,4,5,5,6,
    6,7,7,7,7,8,8,8,8,9,9,9,
    9,10,10,10,10,10,11,11,11,11,12,13,
    14,14,16,
};

NihAV corrected version (extracted from the reference of course):

const COOK_CPL_6BITS_CODES: &[u16; 63] = &[
    0xFFFE, 0x7FFE, 0x3FFC, 0x1FFC, 0x0FFC, 0x07F6, 0x07F7, 0x07F8,
    0x07F9, 0x03F2, 0x03F3, 0x03F4, 0x03F5, 0x01F0, 0x01F1, 0x01F2,
    0x01F3, 0x01F4, 0x00F0, 0x00F1, 0x00F2, 0x00F3, 0x0070, 0x0071,
    0x0072, 0x0073, 0x0034, 0x0035, 0x0016, 0x0017, 0x0004, 0x0000,
    0x000A, 0x0018, 0x0019, 0x0036, 0x0037, 0x0074, 0x0075, 0x0076,
    0x0077, 0x00F4, 0x00F5, 0x00F6, 0x00F7, 0x01F5, 0x01F6, 0x01F7,
    0x01F8, 0x03F6, 0x03F7, 0x03F8, 0x03F9, 0x03FA, 0x07FA, 0x07FB,
    0x07FC, 0x07FD, 0x0FFD, 0x1FFD, 0x3FFD, 0x3FFE, 0xFFFF
];

NihAV, RealMedia, Rust and Everything Else

October 13th, 2018

Looks like it’s been about two months since I last wrote anything about NihAV but that does not mean I did not have anything to write about. On the contrary, I’m glad to report about significant progress in RealAudio support.

Previously I’ve reported about RealVideo 3 and 4 support (as for RealVideo 1/2 and ClearVideo before), so video part was covered quite well but audio part was missing and I went on to rectify the situation.

Now NihAV supports RealAudio 1.0 (speech codec), RealAudio 2.0 (speech codec), RealAudio DNET (a bit about it later), RealAudio 4.0 (speech codec from Sipro), RealAudio Cook (this one deserves a separate post so the next one should be about this codec) and RealAudio Lossless. So there are only three codecs missing now: RealAudio 8 (ATRAC3), RealAudio 9/10 (AAC) and RealVideo 6(HD). Of course I’m going to add support for those as well.

This is actually a good time to implement those. As you might know, there is a Holy Trinity of Licensors: D.vX, D*lby and DT$. They are famous for ‘nice’ licensing terms. While I’ve never had to deal with them, I’ve heard from people who did that they like licensing single product they’re most famous for at outrageous prices (i.e. it’ll cost you a magnitude more per unit using their technology than e.g. H.264 decoder) and it’s a viral license too because if you sell stuff not oriented for consumers then you have to force your customers into the same deal (it’s GPL—Greedy Private License) and you have to report your sales to them for obvious reasons. Funny how two of the companies were bought out already. Now let’s look at them in some details:

  • D.vX This one is remarkable since it licensed the product it had nothing to do with (aka M$MPEG-4 adapted for non-ASF containers and MPEG-4 ASP). At least it seems hardly relevant now unless I dig out some old movies.
  • D*lby This one is mostly known (outside cinema equipment) for codec with several names: ATSC A/52, RealAudio DNET, ETSI TS 102 366, D*lby Digital and even something you can make out of letters A C and 3 (I heard rumours that it does not like its trademarks mentioned so I’d better avoid directly naming it). At least the last patents for that format has expired and support for it can be implemented freely. And it also owns a company that manages licensing of AAC. Fun fact is that patents for MPEG2 NBC are expired so I can implement AAC-LC decoder just fine but that does not stop them for licensing it. How they do it? By refusing to license the separate parts and forcing a whole package of AAC-LC, HE-AACv1, HE-AACv2 and xHE-AAC onto you. I guess if the situation won’t change in twenty years all current stuff will expire but they’ll still license it along with Ultra-Enhanced-Hyper-Expanded-Radically-Extended High-Efficiency AAC (which will have nothing to do with all those previous formats).
  • DT$ A company similar to D*lby and its (former?) prime competition. Also known for single format with many extensions making it essentially a homebrew AAC. At least it seems to be exclusively DVD/Blu-ray format and I’m satisfied with Xine for playing the former and avoiding the latter completely.

And I want to talk a bit more about my RealAudio DNET decoder. Internally it’s called ts102366 for obvious reasons and I have just a primitive implementation for it (i.e. it seems to work and should handle multichannel fine but no extended features). The extension for more than 5.1 channels also seems to be HD-DVD/Blu-ray only so I don’t care, it’s quite rare in RealMedia format and other containers seem to contain it as contiguous stream so I’d need to introduce support for NAElementaryStream in demuxing code and also proper parser to split it into frames. Not worth the effort for me at this moment. Another fun fact is that bitstream comes in 16-bit words that can have any endianness. In my case I just had to detect the proper endianness from first two bytes and simply initialise bitstream reader in BE or LE16 mode depending on it (again, it’s funnier with DT$ format where you have three different bitstream reading modes and you might need two modes simultaneously in some cases; again, good thing I don’t have to care about that stuff). Also it’s still one of two codecs I currently have that support multichannel audio (Cook is the second of course and AAC will be third).

And finally some words about Rust issues I had to deal with.

Rust as a language is more or less fine but compiler sucks. I’ve ran into several issues while writing code.

First, I had a fixed array of Codebooks to initialise in RALF decoder (one of 15 codebooks, another one of 125 codebooks and yet another one of 10×11 codebooks). If I use simply mem::uninitialized() with filling it up it works fine. In debug mode. In release mode it segfaults at the end. Probably I should’ve used ptr::write() instead of assigning and it would work fine but I gave up and used a vector instead of an array even if it’s not as efficient. Obviously it’s all my fault and not Rust issue but still that was weird.

Second, when I tried to create a generic codebook reader that would accept table of codes of any primitive type (u8, u16 or u32) I ran into funnier issue of Rust compiler spewing weird errors like “cannot convert u16 to u32 because it’s not a primitive type”. Obviously it’s my mistake and it’s caught by a tool (that is still not in stable) so the developers don’t care (yes, Luca even bothered to file an issue on that). Still, I’d rather have a clearer error message in that case (e.g. “… because it’s X and not a primitive type”).

And finally, an example that is definitely rustc stupidity and not mine. Again, developers don’t consider this to be an issue but I do (and Luca seemed to agree with me since he opened an issue about it). Essentially, there is a thing called DCE (dead code elimination), so when compilers see that certain block won’t be executed they might print a warning and just check inside code for syntactic validity. Current rustc might ignore condition value and optimise code inside even if it clearly makes no sense (to the point where it crashed because of that on some nightly version, see the issue for details). And while you argue that one should not write such code, I had quite plausible use case for it: a macro that took 2- or 3-element array and did something to its values so if third value was present it had to do something special with it. But of course compilation failed because you tried to do if ARR.len() > 2 { a = ARR[2]; } with two-element array. But when I tried to check whether I got indexing correct by using large constants as indices, cargo check passed just fine—probably because const propagation did not go that deep inside my code (it was in a function called from a long chain in some sub-sub-sub-module and standalone example errors out fine). This feels quite unpolished to me.

Oh, and final final fun thing: the calls like foo.bar(foo.baz) would still fail borrow check probably because they can’t (I guess) formalise function calling convention i.e. “if function is called then first its arguments are evaluated and copied if needed in certain order, then function address is evaluated and called with the arguments”. BTW you still have the situation like this:

struct Foo { foo: u8 }
impl Foo {
    fn bar(&mut self) -> u8 { self.foo += 1; self.foo }
}

fn fee(a: u8, b: u8) {
    println!("{} {}", a, b);
}

fn main() {
    let mut foo = Foo { foo: 42 };
    fee(foo.bar(), foo.bar());
}

And if you don’t know what’s wrong here I’ll tell you: in C argument evaluation is implementation-defined because back in the day there were very different calling conventions and thus compiler needed to start with evaluating from last argument to first to store them in order instead of widespread pushing arguments in order to stack. So depending on ABI the function would be called either as fee(43, 44) or as fee(44, 43).

Now I see two ways out of it: either detect such situation where the same object is mutably called several times and give an error or, which is better IMO, make formal calling convention so the code won’t be undefined. And fix borrow checker while doing that.


Overall, Rust is a nice experience so far since it allows code to structure much better but sometimes you hit such silly issues that spoil all the fun.

Anyway, next post should be about RealAudio Cook, the Opus of its era.

Some Notes on Saarland Railways

October 3rd, 2018

Since today is the state holiday (some time ago two Germanies united into one—which looks more and more like DDR for some reason) why not look at Zoidberg of German lands—Saarland? Well, you might have many reasons (first, it being Saarland) but today I’ve completed my voyage on all of their accessible railways and hence this post.

First, a bit of history. As you all remember, after World War II Germany was split into four occupation zones and while I haven’t heard anything in particular about British occupation zone, the rest of occupation forces were behaving not nice at all: USA installed their military bases everywhere (and most of them are still there—at least it meant less military expenses for West Germany back in the day), USSR tried to convert its piece of Germany into a copy of itself (partly successfully, hopefully it will recover) and France was not satisfied with mere occupation and also tried to seize the part of Germany as its own but it bit more than it could chew and so back in 1957 Saarland was reunited with the rest of Germany (and that day is the state holiday too but I doubt many think of 1st of January as of Saarland reunification day).

Saarland still honours France

Second, a bit of railway network overview. Essentially you can think about it as a cross: there’s a main East-West line going from Mannheim to Trier (or Alt-Chemnitz) via Homburg and Saarbrücken, there’s a North-South line going from Bad Kreuznach to Saarbrücken, there’s a line going South from Saarbrücken to France (and another one served by tram but more about it later), there is a branch Dillingen—Niedaltdorf, there’s a line from Rohrbach to Pirmasens, there’s line Trier—Perl—Metz that goes partly through Saarland and there are several parallel lines connecting Homburg and Saarbrücken. Let’s count: Homburg—Rohrbach—Saarbrücken (that’s what trains from Mannheim to Saarbrücken use), Homburg—Neunkirchen—Saarbrücken (part of it is Nahetalbahn to Bad Kreuznach), Homburg—Neunkirchen—Merchweiler—Saarbrücken (serviced by regional trains Homburg—Illingen and Saarbrücken—Lebach) and finally there’s Homburg—Neunkirchen—Lebach—Saarbrücken via tram line that goes all the way from Lebach to Saarbrücken to Saargemünd (or Sarreguemines as some people write it).

Yes, there’s a tram line in Saarland that essentially crosses half of it. And it’s impossible to confuse it since there’s only one tram line and one tram route in Saarland.

Also I’ve found mentions of three museum lines but looks like only one is functioning: Ottweiler—Schwarzerden line (or Ostertalbahn for short). And I’ve tried it as well. Unlike many other museum lines, this one uses diesel locomotives from the 1960s (but hopefully they’ll manage to rebuild the steam locomotive from the parts they have one day). It was what can be experienced in Ukrainian regional trains—going at about 30km/h while sitting on wooden benches and enjoying looking at the nature outside. At least they boast that they work in any weather (while other museum lines close in Autumn they keep running trains in winter too).

There are many weird things there I’d like to talk about but I’ll leave them to the time when I finish travelling on all railways of Rheinland-Pfalz (should be done next year unless they decide not to open Zellertalbahn again) but here are some of them for now.

First, the train service Saarbrücken—Lebach-Jabach. Fischbachtalbahn (Saarbrücken—Wemmetsweiler) and Primstalbahn from Wemmetsweiler to Illingen are electrified (in Illingen only track 41 is electrified, track 51 is not). Then only a bit of track at Lebach is electrified but about fourteen kilometres in-between are not. We had similar situation here with Bruhrainbahn (between Graben-Neudorf and Germersheim) being not electrified so train Karlsruhe—Mainz ran mostly on electrified rails but still had to be a diesel one. At least this was fixed in 2011 by electrifying the missing piece.

Second, it’s the only tram line in Germany I know that has exit directions repeated in French too.

And third, to make Saarland feel even more like Switzerland, they have the same cryptic booking system: when I bought a ticket from Saargemünd to Lebach it offered me to choose one of three or four possible alternatives—just like buying a rail ticket in Switzerland! Come to think of it, Swiss rail system is exactly like German regional system:

  • Choosing route is the same;
  • German general rail tickets have a whole day of validity (or more for longer distances). German regional tickets are valid for just a couple of hours after purchase—and same in Switzerland (unless it’s some snowy route that might be closed for days);
  • When I bought a ticket from Schaffhausen to Zürich (two different kantons) the ticket also listed zones—like some German regional tickets do;
  • Like with German regional trains, the type does not really matter. It may be S-Bahn, RegioBahn, RegioExpress or InterRegioExpress—the ticket is valid regardless. Same in Switzerland: the same ticket valid for any kind of train and trains change classes during the travel (i.e. train Basel—Chur was labelled as InterCity up to Zürich and InterRegio after that, the difference is only how many intermediate stops it makes);
  • And finally, the famous Swiss train punctuality. Well, it’s a known effect that regional trains have much better punctuality than long-distance ones (and all trains in Switzerland are essentially slow regional trains).

So despite all local jokes about Saarland being very backward place (some even call it “rear end of Germany”) it’s quite European place in some aspects. And remember that it has a real Schengen border (i.e. it borders with Luxembourg where town of Schengen known for some treaty is located).