NihAV « Kostya's Boring Codec World

Archive for the ‘NihAV’ Category

NihAV: even more Bink2 support!

Wednesday, March 13th, 2019

After managing to decode the first frame of KB2g variant I had three options: try to decode the other frames, try to decode other variants or do nothing. While the third option is the most appealing and the first option is the most logical, I stuck with the second one. So now I can decode the first frame of KB2f variant of Bink2 as well. Unfortunately the only (partial) KB2a sample I know is not supported, probably it’s a beta version that was tried on one game like Bink version b. Beside a small surprise in one place bitstream decoding was rather simple. Inter-frame support should not be that hard but it might get messy because of the DC and MV prediction.

And while talking about REing Bink I should mention that I’ve tried Ghidra while doing KB2f work. It is a nice tool that sucks in some places (not having a good highlight for variables, decompiling SIMD code results in very questionable output, the system being Java-based and requiring recent JDK—that’s the worst issue really) it works and produces decent results (including the decompiler). Also since it has 16-bit decompiler support maybe I’ll manage to figure out how those clips in Monty Python & the Quest for the Holy Grail are stored.

I should start documenting it too.

Insignificant update: okay, now it decodes inter-frame data correctly too and the only thing left is to make it reconstruct them correctly. Also I’ve updated codec information on Multimedia Wiki. Actually now it works quite okay so I’m not going to pursue it further. I have no real interest in Bink2 decoding after all.

Posted in Bink, NihAV | 4 Comments »

NihAV: some Bink2 support

Sunday, March 10th, 2019

It took a long time but finally I can decode the first frame of Bink2 video (just KB2g flavour though but it’s a start).

At least the initial observations were correct: Bink2 codes data in 32×32 macroblocks, two codebooks for AC zero runs, one codebook for motion vector components, simple codes with unary prefix for the rest.

If you wonder why it took so long—that’s because I’m lazy and spend an hour or less a day on it. Also while the codec is simple in design it’s a bit complicated in implementation. While previous version related on format sub-version to decide which feature to use, Bink2 uses frame flags to decide which feature to use. For instance, flag 0x1000 signals that there are two bit arrays coded that tell when to read an additional flag during CBP decoding that tells which one of two codebooks should be used during AC decoding later. And flag 0x2000 essentially tells to use different bitstream decoding (like motion vectors decoding or block type decoding). Or the fact that it employs DC and MV prediction that usually has four cases (top-left macroblock, top block, left block, some block inside) plus WMV1-like handling of DC prediction in inter-frames (i.e. it calculated DC for inter blocks and uses them for prediction). And of course DC prediction for inter blocks works a bit different. Plus it tries to track internal state by packing all flags into 32-bit word and updating it for each block (two bits are for signalling top row, one—for macroblock being the leftmost one, some bits are copied from frame flags etc etc). So there’s a lot of nuances to take care of.

And that’s not counting the fact that current Bink2 player can’t decode versions prior to KB2g at all. Since I have some KB2f samples along with an old Bink player that can handle them, I guess I’ll support them eventually.

Posted in Bink, NihAV | Comments Closed

NihAV: Overall Design—Current and Intended

Saturday, February 16th, 2019

Okay, I’ve finally done all the low-hanging fruits and now the progress is blocked by the need of reverse engineering (namely, Bink 2, Discworld Noir BMV and TrueMotion 2X) so I don’t expect anything major happening to the project in near time.

Meanwhile it’s a good opportunity to talk about how NihAV is (mis)designed and how it should work in principle.

(more…)

Posted in NihAV | Comments Closed

NihAV: first quacks

Sunday, February 10th, 2019

As you can guess from the title NihAV got some support for Duck formats, namely TrueMotion 1 and TrueMotion RT. The implementation was rather straightforward except that it took some additional work to support 16-bit video buffers.

Of course I made sure my new TM1 decoder supports decoding sprites. Here’s an example of such sprite picture:

The hardest part was finding a sample.

I can’t sanely support transparency though since it uses 6-bit alpha with RGB555 image and while I can support such format quite easily I’d rather not.

If you wonder about the details of sprite support, it’s almost the same as ordinary inter-coded 16-bit TM1 with some nuances:

frame header has additional 16-bit fields for sprite position and size (and actual sprite size is used in the decoding—the result is supposed to be put over the destination picture);
sprite has twice as much mask bits as inter frame—two per 4×4 block (LSB first as usual). Bits 00 mean the next four pixels should be skipped (and predictor reset to zero), bits 01 mean it’s opaque sprite data and bits 10 mean it’s sprite data with transparency info present;
sprite data is decoded as standard 4×4 TM1 block data (i.e. on C delta per 4×4 block) except that in transparency mode it also reads transparency data after each pixel pair.

That information comes from our old trusty source of information called VPVision source code dump (which was used to understand TrueMotion 1, 2 and probably DK3/DK4 ADPCM (and maybe VP3 but I’m not sure at all). Also it turns out to contain TrueMotion RT encoder source code as well (which could be used to reconstruct the decoder but I forgot about it at the time and used the binary specification instead).

And now I’d like to talk about Duck codecs in general.

The codecs from this family can be divided into three groups:

The Age of Darkness: the original TrueMotion codec and its evolution plus related ADPCM codecs;
The Age of Enlightenment: game codecs evolving into more generic video codecs and using more mainstream codec design (DCT-based, many ideas borrowed from H.263 and H.264) plus AVC (that’s audio codec if you don’t remember);
The Age of EA Guardian: the codecs produced after Duck was bought by certain company.

The Age of Darkness codecs

Those codecs were used mostly in video games but TM1 was also licensed to Horizons Technology.

The idea behind TM1 is very simple: you split video into 4×4 blocks, predict each pixel from top and pack using quantised deltas and fixed codebook looking more like Tunstall codes (i.e. output code is always a fixed length of one byte but it may correspond to a variable length sequence of input codes). Also depending on quality frame blocks have different number of colour difference deltas per block (1, 2 or 4).

TrueMotion RT is an adaptation of TM1 for real-time video capturing (hence the name). In this case video is coded as planar YUV410 using fixed set of deltas with index taking 2, 3 or 4 bits. But the general coding idea (top and left prediction, delta quantisation and coding its index) remains the same. It uses the same frame header obfuscation so it’s probably an elder sibling of TrueMotion 2 (and its name is more like TrueMotion RT version 2.0 and not TrueMotion 2 RT but the details are unclear). There are different versions of the codec, for example Star Control II: The Ur-Quan Masters on 3DO used a special TM1 format split into several files: .hdr for global information (including quantised delta sets), .tbl with codebook definition, .duk with actual frame data and .frm with the frame offsets for .duk file. It’s a pity I can’t support it without very special handling.

TrueMotion 2 gets rid of single static codebook and packs appropriate data (deltas for different-resolution blocks, motion vector data, actual block types etc etc) in separate segments with their own Huffman codes. There are many improvements but the codec still operates on 4×4 blocks with horizontal and vertical prediction of each symbol.

There is not much known about TrueMotion 2X but so far it looks like maybe slightly improved TM2. Hopefully it will be clearer if I manage to implement a decoder for it.

And finally there were two simple ADPCM codecs accompanying video (usually TM2), there’s nothing much to say about those.

The Age of Enlightenment codecs

This was the age when Duck codecs became widely known and accepted, when various companies licensed them for their own needs and when it was really the golden age for them.

It all starts with TrueMotion VP3 that set the standard for the following codecs. It employed the a bit non-standard 8×8 DCT, referencing last intra frame as an alternative to referencing just the previous frame (later knows as golden frame), with various types of information grouped together instead of interleaving it all, and with coefficients coded as tokens (EOB, zero run, plus-minus one, plus-minus two, large coefficient token and such). The same approach would be used for subsequent codecs as well. Of course it briefly enjoyed the renaissance when Duck decided to put it into open-source and Xiph Theora was created on its base (and since there were no other free and open-source video codec alternatives it was destined to have popularity and success before something better comes).

TrueMotion VP4 was mostly the same but with different coding method for some data types. Maybe it was the first codec to move edge loop filtering from being performed on the frame to being performed on temporary block used in motion compensation but I’m not entirely sure.

TrueCast VP5 was the first in the series to employ their own version of static binary arithmetic coder mostly known as bool coder. That means that instead of updating bit probability after each decoding using that context as CABAC does, frame header encodes fixed probabilities (or just updates from the probabilities in the previous frame) and uses them for decoding.

VP6. Probably the most famous of them all since it was used in Flash videos. From technical point of view it’s just small improvement in some details over VP5. I suspect this was the first codec in the series that introduced selecting random frame as the next golden frame (previously it was just last intra frame, now any inter frame can signal that it should become golden).

VP7. This is the first installation in the series that was based on H.264 ideas like 4×4 transform and spatial prediction.

And of course there’s AVS, an audio codec inspired by AAC LC that accompanied some VP5-VP7 videos.

The Age of Guardian codecs

While the design direction has not changed much, the codecs themselves mostly belong to the niche provided by their current owner and hardly used anywhere else. For now we have VP8, VP9 and VP10 (aka AV1).

I hope there will be more to write about those after I write decoders for the rest of them and learn the shameful details of their design in the process.

Posted in NihAV, TrueMotion | 4 Comments »

NihAV: split and games

Wednesday, February 6th, 2019

I have not written anything about NihAV for the last two months. If you thought it’s because nothing was done you’d be mostly correct (but not in thinking about the project at all—this is pointless). And yet there has been some progress recently (but before that I spent a whole month not doing anything NihAV-related).

First thing, I’ve finally split NihAV into smaller crates. This was done in order to reduce compilation and testing time for separate components. It started to feel a lot like FFmpeg and Libav times except that I don’t have to rerun configure and recompile everything in case I want to add a new decoder. Now codecs reside in standalone crates that contain just relevant decoders and demuxers so “write code—fail to compile—fix—run tests” cycle goes faster now.

And here’s the new structure:

nihav-core — the core. It contains the definitions for basic stuff like packets and video/audio buffers, I/O routines (byte- and bitreader), DSP and common video stuff for H.263-based decoders;
nihav-commonfmt — for various common formats. Currently it has all leftover codecs I could not move elsewhere (and AVI demuxer). Maybe in the future it will grow too large so I’ll split out some groups like speech codecs into new crates;
nihav-duck — for Duck game codecs up to VP7;
nihav-game — for various game codecs and containers;
nihav-indeo — for Indeo video and audio codecs;
nihav-rad — for totally RAD formats like Smacker and Bink;
nihav-realmedia — full RealMedia support (except for common decoders in nihav-commonfmt);
and finally nihav-allstuff — the crate that binds them all so you don’t have to search for decoders in separate crates.

Of course there’s nihav-tool to perform the actual decoding and test whether the code works but it’s always been a standalone crate.

This split required some changes in the decoder and demuxer handling. Previously I could get away with one table where all decoders (or demuxers) were registered and then search for an appropriate decoder there. Now it is not possible since there are many crates with individual codecs enabled or disabled in configuration. In result I had to introduce a structure called RegisteredDecoders and have crate-specific function for registering all decoders featured in crate in it. And nihav_allstuff::nihav_register_all_codecs() simply invokes them all for convenience. It’s exactly the same with demuxers too.

N.B.: I should probably write a separate post on overall NihAV design, missing parts and excuses for certain decisions.

Second, as you could spot from the crate names above, instead of trying to make NihAV do something useful (you have rust-av for that though) I decided to focus more on supporting various game formats.

So far I want to support the following formats:

Sierra VMD (developed by Coktel Vision but I care more for Sierra games). Should be an easy one-shot;
Discworld II and Noir BMV. I have added support for DW2 BMV format already (and it took a day to reimplement). DW3 BMV format is almost the same, DW3 audio is known but video codec still requires some reverse engineering. Also fun fact—DW2 BMV video decoder is based on reverse engineered decoder from ScummVM where they did not understand how decoding lengths works and the code still looks like a beautified assembly. For libavcodec decoder I tried my best to simplify the code and give better names to the variables. And when implementing the decoder for nihav-game I found out that “read byte, advance pointer, output nibble and save another one for later” leads to the same results but with a much cleaner code;
RAD codecs. I’ve reimplemented Smacker and Bink formats support (the worst part was NIH-y implementations for DCT-II/DCT-III used in Bink audio decoder but that’s a tale for another time). Maybe now I’ll finally write a decoder for Bink2 video;
Duck codecs. Yes, they are game formats that also were tried as general codecs but with lesser success. There are many games using TrueMotion 1, there are many games using TrueMotion 2, there are some games using TrueMotion VP3 and VP6 in their cutscenes. And the only their codec that got widespread is VP6. I want to implement all the family of the codecs that Duck produced before ~~dying~~ ascending to the higher plane of existence. While most of them are supported there are still missing features like sprites in TrueMotion 1, interlaced mode in VP6 or TM2X entirely. It requires some REing of course but that’s the appeal.

After that it would be nice to work on actual player for the supported formats, which opens a new can of worms like proper frame reordering and format conversion, but I guess I’ll have to deal with that one day.

Anyway, back to doing nothing.

Posted in NihAV | 3 Comments »

NihAV: now with full RealMedia support!

Saturday, December 15th, 2018

In late September 2017 I’ve started to work on RealMedia support in NihAV with an intent to have full support for RealMedia. So more than a year later I’ve reached that goal.

In order to do that I had to reverse engineer one and a half codecs and one format. Here’s the full list.

Supported formats:

RealAudio (i.e. just single audio stream);
plain RealMedia (i.e. just a bunch of audio and video streams);
RealMedia with multiple data chunks (i.e. one or several streams are stored in separate chunk, it’s nothing extraordinary but still needs to be accounted for);
RealMedia multiple stream variants (i.e. single logical stream is represented by several substreams and you have to select one based on quality);
IVR, their own recording format (I had to RE this one).

Supported audio codecs:

RealAudio 1 aka 14.4;
RealAudio 2 aka 28.8;
RealAudio (AC)3 aka DNET;
RealAudio 4/5 (licensed from Sipro);
RealAudio G2 (cook);
RealAudio ATRAC3;
RealAudio AAC-LC (no SBR);
RealAudio Lossless.

And video codecs:

RealVideo 1;
RealVideo Fractal aka ClearVideo (I had to finish REing P-frame format for that one);
RealVideo 2;
RealVideo 3;
RealVideo 4;
RealVideo 6 or HD (I had to RE this one and now it decodes the sample I have with only minor glitches).

And here are some words about IVR that I had to RE this week.

Update: it turns out Paul had reverse engineered the format before NihAV came to existence but his implementation is even sketchier than mine unfortunately.

There are actually two formats there, one with magic .REC that contains actual recording and another one with magic .R1M that may contain several of those .rec embedded. Both formats internally reminded me more of Flash than RealMedia because both files are composed of records that can be distinguished by the first byte (yes, I still remember RTMP and how I had to parse it). R1M has two kinds of records: 0x01—recording metadata it seems, 0x02 contains actual REC.

REC files (or sub-entries in R1M) have defined amount of global properties followed by stream specific properties followed by (optional) stream seek tables followed by actual packets. All numbers are big-endian and 32-bit (seek table offsets seem to be 64-bit). Strings are coded as string length (including terminating zero) followed by string data and zero terminator.

REC record types:

stream properties start, has a number of properties coded after it;
packet header, more about it below;
key-number pair, has key value (string), a number property length (always 4) and actual number value;
binary data, has key value (string), binary data length and actual data;
key-value pair with both key and value being strings;
end of header record with three numbers, first of which gives an absolute (from the beginning of REC data if embedded) offset for the seek tables or packets;
packet data end, always followed by eight zeroes;
packet data start, always followed by eight zeroes. This record seems to be present only when seek tables are present (to detect the end of those?), otherwise packets follow end-of-header record immediately.

There may be several RJMx chunks at the end of IVR with additional metadata but they posed no interest to me.

I had some trouble with IVR packets since I expected them to be exactly the same as in RM but it turned out to be the same payload format but with different header:

32-bit timestamp;
16-bit stream number;
32 bits of flags. I suspect this might code packet group for MLTI substreams, keyframe information and such but I could not find a proper pattern valid for all three samples (and demuxer works fine without it too);
32-bit payload length;
32-bit header checksum (most likely). I was not able to understand how it works but header checksum seems to be the most plausible explanation.

I am fully aware that my current implementation has bugs and flaws and might not decode all files perfectly but it decodes all kinds of files and that’s good for me. Also what to expect from software written by one lazy guy in his free time for himself?

Next is probably Duck type of codecs or totally RAD ones. Or maybe I’ll waste time on making NihAV conform to Rust 2018 edition. This seems to be a task about half as hard as porting code from K&R C to ANSI C (from a quick glance you have to change at least imports from inside the crate, traits now require word dyn and there may be more). Or it may be NAScale for all I care (and I don’t care at all). The time will show.

Posted in NihAV, RealVideo | 2 Comments »

NihAV: The Fastest RealVideo 6 Decoder in Rust!

Thursday, November 29th, 2018

I guess the title shows how stupid marketing something can be if there’s just one contestant in the category—so in order to win you merely need to exist. Like in this case: NihAV can barely decode data and it’s not correct but the images are recognizable (some examples below) and that’s enough to justify this post title. Also I have just one sample clip to test my decoder on but at least RV6 is not a format with many features so only one feature of it is not tested (type 3 frames).

Anyway, here’s the first frame—and it is reconstructed perfectly:

The rest of frames is of significantly worse quality but with more details.
(more…)

Posted in NihAV, RealVideo | 3 Comments »

RMHD: REing Has Started!

Saturday, November 10th, 2018

So, after a long series of delays and procrastination (the common story in NihAV development), I’ve finally stared to work on adding support for RMHD to my project. There are two things to be done actually: add support for RMHD container (so that I can demux actual files) and obviously support for RealVideo 6 itself.

As for RMHD container support, this was quite easy. Some person has given me a file about a year ago so I could look at it and improve my existing RealMedia demuxer. As expected the format is about the same but a bit different: magic word in the header now is “.RMP” instead of “.RMF”, some chunks were upgraded to version 2 (previously they all were either version 0 or version 1) with some additional fields shoved in the middle (or maybe they’ve upgraded them to 64-bit). The WTFiest change is how audio header was written: binary fields tell it’s header version 5 (it is) but text identifier says “.rm4” which obviously comes from version 4 and the provided header size is too short compared to the conventional RMVB and actual header data. At last after some hacks added to my RealMedia demuxer I was able to decode AAC track from the file so only the video part remains.

For RV6 I have the binary specification with a lot of debug symbols left. Even though I’ve made a good progress on it, I expect CEmpeg to add the opensource decoder for it first. After all, if you search for strings in the librv11dec.so from the official site you’ll see the function names that will make almost anybody recognizing them cringe:

C_hScale8To15(short*, int, unsigned char const*, short const*, short const*, int)
C_yuv2planeX(short const*, int, short const**, unsigned char*, int, unsigned char const*, int)
sws_pb_64
SSE2_yuv2planeX(short const*, int, short const**, unsigned char*, int, unsigned char const*, int)
SSE2_hScale8To15(short*, int, unsigned char const*, short const*, short const*, int)
yuv2planeX_8_c_opt(short const*, int, short const**, unsigned char*, int, unsigned char const*, int, int)

In case you belong to the blessed majority who do not know what those functions are, these are part of libswscale, a library for frame colourspace conversion and scaling licensed under LGPL. Which means pieces of it were borrowed and adapted for (YUV-only) frame scaling in RV6 decoder and thus it source code should be available under LGPL. And imperfect English messages in the same specification (“not support” consistently occurs throughout it) hint that it was quite probably Chinese developers responsible for this. Since CEmpeg has both the majority of libswscale developers and people who speak Chinese, it should be no problem for them to obtain the codes. Meanwhile I’ll continue my work because NIH in NihAV is there for a reason (that also reminds me I should port NAScale to Rust eventually).

Anyway, the actual description of RealVideo 6 deserves a separate post so I’ll dedicate the following post to it (but who would read that anyway?).

Posted in NihAV, RealVideo | 2 Comments »

NihAV: Lurching Forward!

Sunday, October 28th, 2018

Finally NihAV got full* support for RealAudio. Marketing asterisk there is for AAC support since only AAC LC is supported which means decoding only 22kHz from racp (as I said before, no SBR support) plus some other features were cut: it supports just multichannel audio but no coupling feature, noise codebook or e.g. LTP from AAC Main. Also for similar reasons I did not care to optimise its performance. I don’t care much about it and I don’t want to spend a lot of time on AAC. Here’s some fun statistics for you: AAC decoder is about 1800 lines long (IMDCT, window generation and bitstream parsing are common modules but the rest of AAC stuff is there), about 100 lines are spent on subband tables, 350 lines are spent on codebook tables and 250 lines are wasted on “proper” parsing of general MPEG-4 Audio descriptor with type-specific stuff like HILN or ALS configuration being simply left as unimplemented!() (a useful Rust macro that will report it and abort program if it reaches that point so you know when to start caring).

And now for the fun Rust part.

My codebook generator takes something that implements CodebookDescReader trait which is essentially a common interface that specifies how to get information for each codebook element (codeword, its length and associated symbol). While I had some standard implementations, they were not very useful since the most common case is when you have two tables of different types (8 bits are enough for codeword lengths and actual codewords may take 8-32 bits) and you want to use some generic implementation instead of writing the same code again and again.

In theory it should be easy: you create generic structure that stores pointers to those tables and converts them into needed types. It would work after some fiddling with type constraints but the problem is to make the return type what you want i.e. the function takes entry index number (as usize) and returns some more sensible type (so that when I read code later more appropriate type will be used).

And that is rather hard problem because generic return type can’t be converted from usize easily. I asked Luca of rust-av fame (he famously does not work on it) and he provided me with two and a half solutions that won’t work in NihAV:

Use unstable TryInto trait that can try to convert input type into something smaller—rejected since it’s unstable;
Use num_traits crate for conversion traits that work with stable—rejected since it’s an external crate;
Implement such trait yourself. Hmm…

So I NIHed it by taking a function that would map input index value to output type. This way is both simpler and more flexible. For example, AAC scalefactor codebook has values in the range -60..+60 so instead of writing new structure like I did before I can simply do

fn scale_map(idx: usize) -> i8 { (idx as i8) - 60 }
...
let mut coderead = TableCodebookDescReader::new(AAC_SCF_CODEBOOK_CODES, AC_SCF_CODEBOOK_BITS, scale_map);

And this works as expected.

There’s a small fun thing related to it. By old C habit I wrote initially &fn as argument type and got cryptic Rust compiler error about mismatched types:

    = note: expected type `&fn(usize) -> i8`
               found type `&fn(usize) -> i8 {codecs::aac::scale_map}`

Again, my fault but rustc output is a bit WTFy.

Anyway, the next is supposedly RealVideo HD or RealVideo 6 decoder and then I can forget about this codec family and move to something different like DuckMotion or Smacker/Bink.

And in the very unlikely case you were wondering why I’m so slow I can tell you why: I am lazy (big reveal, I know), I prefer to spend my time differently (on workdays I work, on Sunday I travel around, that leaves only couple of hours during workdays, some part of Saturday and a bit of Sunday if I’m lucky) and I work better when the stuff is interesting which is not always the case. Especially if you consider that there is no multimedia framework in Rust (yet?) so I have to NIH every small bit and some of those bits I don’t like much. So don’t expect any news from me soon.

Posted in NihAV, Rust | 1 Comment »

NihAV, RealMedia, Rust and Everything Else

Saturday, October 13th, 2018

Looks like it’s been about two months since I last wrote anything about NihAV but that does not mean I did not have anything to write about. On the contrary, I’m glad to report about significant progress in RealAudio support.

Previously I’ve reported about RealVideo 3 and 4 support (as for RealVideo 1/2 and ClearVideo before), so video part was covered quite well but audio part was missing and I went on to rectify the situation.

Now NihAV supports RealAudio 1.0 (speech codec), RealAudio 2.0 (speech codec), RealAudio DNET (a bit about it later), RealAudio 4.0 (speech codec from Sipro), RealAudio Cook (this one deserves a separate post so the next one should be about this codec) and RealAudio Lossless. So there are only three codecs missing now: RealAudio 8 (ATRAC3), RealAudio 9/10 (AAC) and RealVideo 6(HD). Of course I’m going to add support for those as well.

This is actually a good time to implement those. As you might know, there is a Holy Trinity of Licensors: D.vX, D*lby and DT$. They are famous for ‘nice’ licensing terms. While I’ve never had to deal with them, I’ve heard from people who did that they like licensing single product they’re most famous for at outrageous prices (i.e. it’ll cost you a magnitude more per unit using their technology than e.g. H.264 decoder) and it’s a viral license too because if you sell stuff not oriented for consumers then you have to force your customers into the same deal (it’s GPL—Greedy Private License) and you have to report your sales to them for obvious reasons. Funny how two of the companies were bought out already. Now let’s look at them in some details:

D.vX This one is remarkable since it licensed the product it had nothing to do with (aka M$MPEG-4 adapted for non-ASF containers and MPEG-4 ASP). At least it seems hardly relevant now unless I dig out some old movies.
D*lby This one is mostly known (outside cinema equipment) for codec with several names: ATSC A/52, RealAudio DNET, ETSI TS 102 366, D*lby Digital and even something you can make out of letters A C and 3 (I heard rumours that it does not like its trademarks mentioned so I’d better avoid directly naming it). At least the last patents for that format has expired and support for it can be implemented freely. And it also owns a company that manages licensing of AAC. Fun fact is that patents for MPEG2 NBC are expired so I can implement AAC-LC decoder just fine but that does not stop them for licensing it. How they do it? By refusing to license the separate parts and forcing a whole package of AAC-LC, HE-AACv1, HE-AACv2 and xHE-AAC onto you. I guess if the situation won’t change in twenty years all current stuff will expire but they’ll still license it along with Ultra-Enhanced-Hyper-Expanded-Radically-Extended High-Efficiency AAC (which will have nothing to do with all those previous formats).
DT$ A company similar to D*lby and its (former?) prime competition. Also known for single format with many extensions making it essentially a homebrew AAC. At least it seems to be exclusively DVD/Blu-ray format and I’m satisfied with Xine for playing the former and avoiding the latter completely.

And I want to talk a bit more about my RealAudio DNET decoder. Internally it’s called ts102366 for obvious reasons and I have just a primitive implementation for it (i.e. it seems to work and should handle multichannel fine but no extended features). The extension for more than 5.1 channels also seems to be HD-DVD/Blu-ray only so I don’t care, it’s quite rare in RealMedia format and other containers seem to contain it as contiguous stream so I’d need to introduce support for NAElementaryStream in demuxing code and also proper parser to split it into frames. Not worth the effort for me at this moment. Another fun fact is that bitstream comes in 16-bit words that can have any endianness. In my case I just had to detect the proper endianness from first two bytes and simply initialise bitstream reader in BE or LE16 mode depending on it (again, it’s funnier with DT$ format where you have three different bitstream reading modes and you might need two modes simultaneously in some cases; again, good thing I don’t have to care about that stuff). Also it’s still one of two codecs I currently have that support multichannel audio (Cook is the second of course and AAC will be third).

And finally some words about Rust issues I had to deal with.

Rust as a language is more or less fine but compiler sucks. I’ve ran into several issues while writing code.

First, I had a fixed array of Codebooks to initialise in RALF decoder (one of 15 codebooks, another one of 125 codebooks and yet another one of 10×11 codebooks). If I use simply mem::uninitialized() with filling it up it works fine. In debug mode. In release mode it segfaults at the end. Probably I should’ve used ptr::write() instead of assigning and it would work fine but I gave up and used a vector instead of an array even if it’s not as efficient. Obviously it’s all my fault and not Rust issue but still that was weird.

Second, when I tried to create a generic codebook reader that would accept table of codes of any primitive type (u8, u16 or u32) I ran into funnier issue of Rust compiler spewing weird errors like “cannot convert u16 to u32 because it’s not a primitive type”. Obviously it’s my mistake and it’s caught by a tool (that is still not in stable) so the developers don’t care (yes, Luca even bothered to file an issue on that). Still, I’d rather have a clearer error message in that case (e.g. “… because it’s X and not a primitive type”).

And finally, an example that is definitely rustc stupidity and not mine. Again, developers don’t consider this to be an issue but I do (and Luca seemed to agree with me since he opened an issue about it). Essentially, there is a thing called DCE (dead code elimination), so when compilers see that certain block won’t be executed they might print a warning and just check inside code for syntactic validity. Current rustc might ignore condition value and optimise code inside even if it clearly makes no sense (to the point where it crashed because of that on some nightly version, see the issue for details). And while you argue that one should not write such code, I had quite plausible use case for it: a macro that took 2- or 3-element array and did something to its values so if third value was present it had to do something special with it. But of course compilation failed because you tried to do if ARR.len() > 2 { a = ARR[2]; } with two-element array. But when I tried to check whether I got indexing correct by using large constants as indices, cargo check passed just fine—probably because const propagation did not go that deep inside my code (it was in a function called from a long chain in some sub-sub-sub-module and standalone example errors out fine). This feels quite unpolished to me.

Oh, and final final fun thing: the calls like foo.bar(foo.baz) would still fail borrow check probably because they can’t (I guess) formalise function calling convention i.e. “if function is called then first its arguments are evaluated and copied if needed in certain order, then function address is evaluated and called with the arguments”. BTW you still have the situation like this:

struct Foo { foo: u8 }
impl Foo {
    fn bar(&mut self) -> u8 { self.foo += 1; self.foo }
}

fn fee(a: u8, b: u8) {
    println!("{} {}", a, b);
}

fn main() {
    let mut foo = Foo { foo: 42 };
    fee(foo.bar(), foo.bar());
}

And if you don’t know what’s wrong here I’ll tell you: in C argument evaluation is implementation-defined because back in the day there were very different calling conventions and thus compiler needed to start with evaluating from last argument to first to store them in order instead of widespread pushing arguments in order to stack. So depending on ABI the function would be called either as fee(43, 44) or as fee(44, 43).

Now I see two ways out of it: either detect such situation where the same object is mutably called several times and give an error or, which is better IMO, make formal calling convention so the code won’t be undefined. And fix borrow checker while doing that.

Overall, Rust is a nice experience so far since it allows code to structure much better but sometimes you hit such silly issues that spoil all the fun.

Anyway, next post should be about RealAudio Cook, the Opus of its era.

Posted in Audio, NihAV, Rust, Useless Rants | 2 Comments »

Kostya's Boring Codec World