Archive for the ‘Rust’ Category

NihAV: Lurching Forward!

Sunday, October 28th, 2018

Finally NihAV got full* support for RealAudio. Marketing asterisk there is for AAC support since only AAC LC is supported which means decoding only 22kHz from racp (as I said before, no SBR support) plus some other features were cut: it supports just multichannel audio but no coupling feature, noise codebook or e.g. LTP from AAC Main. Also for similar reasons I did not care to optimise its performance. I don’t care much about it and I don’t want to spend a lot of time on AAC. Here’s some fun statistics for you: AAC decoder is about 1800 lines long (IMDCT, window generation and bitstream parsing are common modules but the rest of AAC stuff is there), about 100 lines are spent on subband tables, 350 lines are spent on codebook tables and 250 lines are wasted on “proper” parsing of general MPEG-4 Audio descriptor with type-specific stuff like HILN or ALS configuration being simply left as unimplemented!() (a useful Rust macro that will report it and abort program if it reaches that point so you know when to start caring).

And now for the fun Rust part.

My codebook generator takes something that implements CodebookDescReader trait which is essentially a common interface that specifies how to get information for each codebook element (codeword, its length and associated symbol). While I had some standard implementations, they were not very useful since the most common case is when you have two tables of different types (8 bits are enough for codeword lengths and actual codewords may take 8-32 bits) and you want to use some generic implementation instead of writing the same code again and again.

In theory it should be easy: you create generic structure that stores pointers to those tables and converts them into needed types. It would work after some fiddling with type constraints but the problem is to make the return type what you want i.e. the function takes entry index number (as usize) and returns some more sensible type (so that when I read code later more appropriate type will be used).

And that is rather hard problem because generic return type can’t be converted from usize easily. I asked Luca of rust-av fame (he famously does not work on it) and he provided me with two and a half solutions that won’t work in NihAV:

  • Use unstable TryInto trait that can try to convert input type into something smaller—rejected since it’s unstable;
  • Use num_traits crate for conversion traits that work with stable—rejected since it’s an external crate;
  • Implement such trait yourself. Hmm…

So I NIHed it by taking a function that would map input index value to output type. This way is both simpler and more flexible. For example, AAC scalefactor codebook has values in the range -60..+60 so instead of writing new structure like I did before I can simply do

fn scale_map(idx: usize) -> i8 { (idx as i8) - 60 }
...
let mut coderead = TableCodebookDescReader::new(AAC_SCF_CODEBOOK_CODES, AC_SCF_CODEBOOK_BITS, scale_map);

And this works as expected.

There’s a small fun thing related to it. By old C habit I wrote initially &fn as argument type and got cryptic Rust compiler error about mismatched types:

    = note: expected type `&fn(usize) -> i8`
               found type `&fn(usize) -> i8 {codecs::aac::scale_map}`

Again, my fault but rustc output is a bit WTFy.


Anyway, the next is supposedly RealVideo HD or RealVideo 6 decoder and then I can forget about this codec family and move to something different like DuckMotion or Smacker/Bink.

And in the very unlikely case you were wondering why I’m so slow I can tell you why: I am lazy (big reveal, I know), I prefer to spend my time differently (on workdays I work, on Sunday I travel around, that leaves only couple of hours during workdays, some part of Saturday and a bit of Sunday if I’m lucky) and I work better when the stuff is interesting which is not always the case. Especially if you consider that there is no multimedia framework in Rust (yet?) so I have to NIH every small bit and some of those bits I don’t like much. So don’t expect any news from me soon.

NihAV, RealMedia, Rust and Everything Else

Saturday, October 13th, 2018

Looks like it’s been about two months since I last wrote anything about NihAV but that does not mean I did not have anything to write about. On the contrary, I’m glad to report about significant progress in RealAudio support.

Previously I’ve reported about RealVideo 3 and 4 support (as for RealVideo 1/2 and ClearVideo before), so video part was covered quite well but audio part was missing and I went on to rectify the situation.

Now NihAV supports RealAudio 1.0 (speech codec), RealAudio 2.0 (speech codec), RealAudio DNET (a bit about it later), RealAudio 4.0 (speech codec from Sipro), RealAudio Cook (this one deserves a separate post so the next one should be about this codec) and RealAudio Lossless. So there are only three codecs missing now: RealAudio 8 (ATRAC3), RealAudio 9/10 (AAC) and RealVideo 6(HD). Of course I’m going to add support for those as well.

This is actually a good time to implement those. As you might know, there is a Holy Trinity of Licensors: D.vX, D*lby and DT$. They are famous for ‘nice’ licensing terms. While I’ve never had to deal with them, I’ve heard from people who did that they like licensing single product they’re most famous for at outrageous prices (i.e. it’ll cost you a magnitude more per unit using their technology than e.g. H.264 decoder) and it’s a viral license too because if you sell stuff not oriented for consumers then you have to force your customers into the same deal (it’s GPL—Greedy Private License) and you have to report your sales to them for obvious reasons. Funny how two of the companies were bought out already. Now let’s look at them in some details:

  • D.vX This one is remarkable since it licensed the product it had nothing to do with (aka M$MPEG-4 adapted for non-ASF containers and MPEG-4 ASP). At least it seems hardly relevant now unless I dig out some old movies.
  • D*lby This one is mostly known (outside cinema equipment) for codec with several names: ATSC A/52, RealAudio DNET, ETSI TS 102 366, D*lby Digital and even something you can make out of letters A C and 3 (I heard rumours that it does not like its trademarks mentioned so I’d better avoid directly naming it). At least the last patents for that format has expired and support for it can be implemented freely. And it also owns a company that manages licensing of AAC. Fun fact is that patents for MPEG2 NBC are expired so I can implement AAC-LC decoder just fine but that does not stop them for licensing it. How they do it? By refusing to license the separate parts and forcing a whole package of AAC-LC, HE-AACv1, HE-AACv2 and xHE-AAC onto you. I guess if the situation won’t change in twenty years all current stuff will expire but they’ll still license it along with Ultra-Enhanced-Hyper-Expanded-Radically-Extended High-Efficiency AAC (which will have nothing to do with all those previous formats).
  • DT$ A company similar to D*lby and its (former?) prime competition. Also known for single format with many extensions making it essentially a homebrew AAC. At least it seems to be exclusively DVD/Blu-ray format and I’m satisfied with Xine for playing the former and avoiding the latter completely.

And I want to talk a bit more about my RealAudio DNET decoder. Internally it’s called ts102366 for obvious reasons and I have just a primitive implementation for it (i.e. it seems to work and should handle multichannel fine but no extended features). The extension for more than 5.1 channels also seems to be HD-DVD/Blu-ray only so I don’t care, it’s quite rare in RealMedia format and other containers seem to contain it as contiguous stream so I’d need to introduce support for NAElementaryStream in demuxing code and also proper parser to split it into frames. Not worth the effort for me at this moment. Another fun fact is that bitstream comes in 16-bit words that can have any endianness. In my case I just had to detect the proper endianness from first two bytes and simply initialise bitstream reader in BE or LE16 mode depending on it (again, it’s funnier with DT$ format where you have three different bitstream reading modes and you might need two modes simultaneously in some cases; again, good thing I don’t have to care about that stuff). Also it’s still one of two codecs I currently have that support multichannel audio (Cook is the second of course and AAC will be third).

And finally some words about Rust issues I had to deal with.

Rust as a language is more or less fine but compiler sucks. I’ve ran into several issues while writing code.

First, I had a fixed array of Codebooks to initialise in RALF decoder (one of 15 codebooks, another one of 125 codebooks and yet another one of 10×11 codebooks). If I use simply mem::uninitialized() with filling it up it works fine. In debug mode. In release mode it segfaults at the end. Probably I should’ve used ptr::write() instead of assigning and it would work fine but I gave up and used a vector instead of an array even if it’s not as efficient. Obviously it’s all my fault and not Rust issue but still that was weird.

Second, when I tried to create a generic codebook reader that would accept table of codes of any primitive type (u8, u16 or u32) I ran into funnier issue of Rust compiler spewing weird errors like “cannot convert u16 to u32 because it’s not a primitive type”. Obviously it’s my mistake and it’s caught by a tool (that is still not in stable) so the developers don’t care (yes, Luca even bothered to file an issue on that). Still, I’d rather have a clearer error message in that case (e.g. “… because it’s X and not a primitive type”).

And finally, an example that is definitely rustc stupidity and not mine. Again, developers don’t consider this to be an issue but I do (and Luca seemed to agree with me since he opened an issue about it). Essentially, there is a thing called DCE (dead code elimination), so when compilers see that certain block won’t be executed they might print a warning and just check inside code for syntactic validity. Current rustc might ignore condition value and optimise code inside even if it clearly makes no sense (to the point where it crashed because of that on some nightly version, see the issue for details). And while you argue that one should not write such code, I had quite plausible use case for it: a macro that took 2- or 3-element array and did something to its values so if third value was present it had to do something special with it. But of course compilation failed because you tried to do if ARR.len() > 2 { a = ARR[2]; } with two-element array. But when I tried to check whether I got indexing correct by using large constants as indices, cargo check passed just fine—probably because const propagation did not go that deep inside my code (it was in a function called from a long chain in some sub-sub-sub-module and standalone example errors out fine). This feels quite unpolished to me.

Oh, and final final fun thing: the calls like foo.bar(foo.baz) would still fail borrow check probably because they can’t (I guess) formalise function calling convention i.e. “if function is called then first its arguments are evaluated and copied if needed in certain order, then function address is evaluated and called with the arguments”. BTW you still have the situation like this:

struct Foo { foo: u8 }
impl Foo {
    fn bar(&mut self) -> u8 { self.foo += 1; self.foo }
}

fn fee(a: u8, b: u8) {
    println!("{} {}", a, b);
}

fn main() {
    let mut foo = Foo { foo: 42 };
    fee(foo.bar(), foo.bar());
}

And if you don’t know what’s wrong here I’ll tell you: in C argument evaluation is implementation-defined because back in the day there were very different calling conventions and thus compiler needed to start with evaluating from last argument to first to store them in order instead of widespread pushing arguments in order to stack. So depending on ABI the function would be called either as fee(43, 44) or as fee(44, 43).

Now I see two ways out of it: either detect such situation where the same object is mutably called several times and give an error or, which is better IMO, make formal calling convention so the code won’t be undefined. And fix borrow checker while doing that.


Overall, Rust is a nice experience so far since it allows code to structure much better but sometimes you hit such silly issues that spoil all the fun.

Anyway, next post should be about RealAudio Cook, the Opus of its era.

NihAV: Some Progress to Report!

Friday, August 24th, 2018

Finally the large chunk is finished: NihAV has finally got support for RealVideo 3 and 4!

Since I’ve learned a great deal more about codecs since the last time I wrote RealVideo 3/4 decoder (and specifications for both were leaked—they have mistakes but still clarify some things), I was able to write a new decoder that also seems to reconstruct frames better.

Some words on the design: I’ve split it into several parts as usual—common RV3/4 code, RV3/4 DSP, RV3 bitstream parser, RV3 DSP and RV4 bitstream parser and DSP. That’s the approach I’ve been using before and I’ll probably use it in future decoders as well. The only more or less interesting thing is how I did weighted motion compensation: instead of temporary buffer I allocate 16×16 frame that I use for storing temporary results and which is used later to average results (since motion compensation routines in RealVideo 3 and 4 differ while weighted averaging is the same it makes sense to split it into separate operation).

And now for the juicy part: benchmarks and performance. I’ve tested one of the RealVideo 4 trailers (namely swordfish.rmvb) and avconv -threads 1 -cpuflags 0 decodes it in 15 seconds, nihav-tool needs almost 25.
(more…)

NihAV: Progress Report

Monday, July 2nd, 2018

I’m still working (barely) on NihAV and I’ve managed to make my code decode both RealVideo 3 and 4. It’s not always correct, especially B-frames and some corner cases, but at least it produces a sane picture in most cases.

And this time I’d like to write about disadvantages of writing motion compensation functions in Rust instead of C.
(more…)

NihAV: progress report

Sunday, June 10th, 2018

Well, since I had no incentive to work on NihAV and recently the weather is not very encouraging for any kind of intellectual activity there was almost no progress. And yet now I have something to write about: NihAV has finally managed to decode non-trivial (i.e not fully black) RealVideo 3 I-frame properly (i.e. without any visible distortions). Loop filter is still missing but it’s a start. And it’s not a small feat considering one has to implement both coefficients decoding and intra prediction. So essentially it’s just motion vector juggling and motion compensation are all the things that are missing for P- and B-frames support. Maybe it will go faster from here (but most likely not).

And since doing that involved rewriting some C code into Rust here are some notes on how oxidising went:

  • match is a nice replacement for the cases when you have to partly remap values—in my case I had to adjust intra prediction directions in case top or left or bottom reference were missing and that means changing three or four values into other values, match looks more compact than several } else if (itype == FOO) { and does not lose readability either;
  • while in C foo = bar = 42; is a common thing, Rust does not allow this (I can understand why) and I’m surprised I ran into it only now (with intra prediction functions that assign the same calculated value to several output pixels at once);
  • loops in Rust are fine for basic use but when you need to deal with something more complex like for (i = 0; i < block_size; i += 4) or for (i = 99; i > 0; i--) you need either to write a simpler loop and remap indices inside or to remember it’s Rust and permute range in less intuitive ways like for i in (0..block_size).filter(|x| x&3 == 0) and for i in (1..99+1).rev(). While this works and even somewhat conveys the meaning it’s a bit unwieldy IMO;
  • and it might be a bit too esoteric but looks like I cannot easily write fn clip_u8(val: N) -> u8 that would take any primitive numeric type as input, do comparisons inside and return value either clipped to converted to u8. The best answer on how to do it I found was “you can’t, it’s against Rust practices”. I don’t need it much and I care even less, so I’ll just mark it as a neutral language feature and forget about it.

And now the small but constantly irritating thing: arrays. While slices are nice and easy to use (including extracting sub-slices), in my area I often need a slice with arbitrary start and end bounds. To clarify my use case: quite often you need a piece of memory that’s addressable with both positive and negative indices and those make sense on certain interval.

One of such common arrays is clipping array which essentially takes input index and returns it clipped usually to 0-255 range. So you have part [-255..-1] filled with zeroes, [0..255] filled with values in the same range and [256..511] filled with 255. I repeat, such clipping table is very common and useful thing that’s currently not easy to implement in Rust.

Another less common case is the block of pixels we process may require information from its top, left and top-left neighbours—and those are addressed as src[-stride + i], src[-1 + stride*i] and src[-stride - 1]. Or a whole frame of GDI-related codec (no, not from Westwood) or even simple BMP/DIB that stores lines upside-down so after you process line 0 you have to move to line -1.

I currently deal with it by keeping an additional variable pointing to the current position in array that I use as a reference and from which I can subtract other numbers if needed, but it’s a bit clunky and error-prone. Since Rust checks indices on slice access I wonder if extending it to work with e.g. negative indices is possible. IIRC FORTRAN and Pascal allowed you to define an array starting with arbitrary index, it might be possible in Rust too.

Oh well, I’ll just keep using my approach meanwhile and waiting to see what rust-av does in this regard.

Rust: Lifetimes Sugar

Sunday, May 27th, 2018

One of the Rust language features is explicit object lifetimes that help compiler correctly track memory usage and free objects without using garbage collector. A neat idea but it leads to lifetime specifiers being used everywhere including places where compiler should be smart enough to deal with them without explicit mentions in every place.

Maybe I’m using Rust wrong but in most of the cases I create objects that have no need for lifetime specifier or the objects that have the same lifetime for both its members and itself. Thus I argue that in addition to generic lifetime specifier 'a (or whatever the name you give it) and obviously named 'static there should be 'self that specifies the lifetime to be exactly the same as the object itself.

So, instead of current:

struct Foo<'a> {
  myref: &'a [u8],
  subobj: Bar<'a>,
}

impl<'a> Foo<'a> {
  pub fn new(myref: &'a [u8], subobj: Bar<'a>) -> Self { ... }
}

it should be possible to write:

struct Foo {
  myref: &'self [u8],
  subobj: Bar,
}

impl Foo {
  pub fn new(myref: &'self [u8], subobj: Bar) -> Self { ... }
}

I am not sure whether compiler needs to perform some additional things in such objects compared to objects without no lifetime specifier but it should be easy to assign proper lifetime after parsing the structure definition anyway and I’m pretty sure the compiler does something like this anyway.

And I see only these reasons why this has not been done yet:

  • Considerations for compiler simplicity (i.e. parsing process should be kept as simple as possible)—I still think it should be easy for compiler to recognize the lifetime definition by the time structure declaration parsing is over and it’s used externally (i.e. for objects using this one);
  • Considerations for language clarity and consistency (i.e. it’s immediately obvious when you look at the object that it deals with lifetimes but not with the proposed change). I’d argue that explicit lifetimes should be kept for complex cases only, when you have to juggle lifetimes from several complex sources, and the objects with references not outliving themselves should be fine;
  • Simple oversight (i.e. “we did not think of such simplification”) or developers’ bias (i.e. “we got used to writing lifetime specifiers everywhere that we didn’t think it annoys anybody”). You should be able to guess what I have to say about such argument.

So all in all I’d be happy to either hear why it cannot be done (beside the compatibility with the existing code) or see it implemented. But most likely this will be ignored (and I’m fine with that too).

Rust in multimedia: unwieldy features

Sunday, March 18th, 2018

Today I wanted to talk about two features that are quite important for multimedia decoding but are quite inconvenient in the current state.

First, macros. I know that macros in Rust are both very powerful and quite flexible but they are hard to use for data definition and I ranted about it before. The problem is that quite often you have tables with some internal structure that would benefit from macro substitutions: if you have a codebook constructed from entries following patterns like a, b, -a, -b and a, b, a, -b, -a, b, -a, -b it would be easier and less error-prone to represent them as e.g. FLIP2#(a, b) and FLIP4#(a, b) inside the data definition. The problem is that macro! does not allow you to do that easily since it’s supposed to expand into valid statements (i.e. code or full data definitions). Of course you can work it around by making a set of macros to define the whole array and some bits inside it but that’s what makes it unwieldy. And that’s why I believe there should be another macro substitution mechanism, maybe named macro#, that would work just on data but it’d be much easier to use in that particular case.

The second issue is assembly integration. Despite Rust being fast and such it’s still better to write small critical functions in assembly. And obviously it would be better if Cargo supported including assembler files into crate. You can point out there’s stdsimd for using the power of SIMD without much hassle. I can point out that compiler-generated code is still far from being perfect even with intrinsics and assembly is still better; supporting querying SIMD capabilities via standard package is good though. And you can point out that there’s a special crate for supporting various files with various compilers/assemblers already. I’d say that it’s a bit too generic but at least it can serve as a base for what I need. Again, there’s more or less standard way to deal with assembly files so making a common standard is not hard.

And in the unlikely case somebody reads this and asks why I don’t form an RFC—from what I heard it involves proposing code as well and I don’t want to study the compiler nor waste days compiling it.

Rust: Annoyance-Driven Design

Sunday, December 3rd, 2017

I’ve finally made NihAV decode RealVideo 2 content, including B-frames (there are still 4 video codecs to support (and I don’t have any samples for RMHD) and all audio codecs too so it’s a long way) and so I have some more words to say about Rust and my experience with it.

To me it looks like the most decisions on decompositions in Rust are the consequences of annoyance of making it other way? Too large structures mean you have to either pass too many arguments into new() or fill it with some defaults (and I’m pretty sure that #derive[Default] won’t save you with more complex types) and initialise to sane values later. In result it’s easier to split everything into smaller structures which are (at least) subjectively are much easier to handle, especially if you reference them as Option<YourStruct>. Modules and imports, on the other hoof, are more annoying to manage since you have to take care of proper dependencies, visibility and imports—in result I find it easier to import all stuff from all modules and just keep comment out currently unused imports (because I still can’t bring myself to make it all a single mega-module). And now for the even higher level: crates. Yes, I’m going to beat that undead horse again.

First of all, I’m aware of incremental building enabled in nocturnal Rust but I’m not going to use nightly for rather obvious reasons (mostly because I’m not here to experiment with the all potential bells and whistles of the language but rather what it can offer right out of the box and how it suits my needs). So, the compilation times are horrible: when I change a single non-public function it rebuilds the whole crate (which is supposed behaviour, I know) and it takes 15 seconds to do that. Obviously it’s laughable for people doing “serious” projects but it’s basic fact that humans expect response (any response) in about five seconds after the action or they get impatient. In result instead of one crate with optional features (in my case decoders and demuxers) I’d rather have several smaller crates and that creates new issues too. There’s this obvious npm.js kind of issue of making packages for every small thing so your programs ends with more package dependencies than modern Linux distribution. But there’s also the issue with package splitting: I’d like to split my code into packages that encompass certain family of features—e.g. nihav-core for common stuff, nihav-avi for AVI demuxer, nihav-indeo for all Indeo codecs (audio and video) and nihav-realmedia for RealMedia demuxer and related codecs—then some of them may depend on some common package (like H.263 common core for Intel I.263 and RealVideo 1 and 2 decoders) but probably with different features requested (one of them does not need B-frame support, another one does not need PB-frame support). Since I don’t know quantum cargodynamics I don’t know how it will all be resolved. So it will either end in dead code or code duplication (in an additional crate too, I suppose).

My theory is that people behind Rust are biased by their development environment. In other words you don’t care much about compilation times when you have to build browsers (or compilers) on daily basis. While my main development machine is a laptop I bought in 2010 with 8GB of RAM (which I believed to be future-proof). So the Rust language designers might either have beefy machines to deal with fast compilation or be conditioned to long development cycles. I know that back in the day “start compiling Linux kernel and go make some coffee to pass 45 minutes of compilation time” was quite common but I guess it’s Jevons’ paradox all over again: the more computing power is there the more it’s wasted on compilation times. Like modern C++ or single-header libraries: you actually have to compile a very large corpus of code as single file. Back in the days my laptop with 64MB RAM was spending most of the time compiling libavcodec/dsputil.c (a monstrous file full of templates that old FFmpeg developers might remember even today) so I had to install more RAM in order to make compilation time reasonable. The solution was to split the file instead of upgrading the machines for every developer but nowadays it’d be seen as a ridiculous solution.

And now documentation. I find it rather poor (but that’s common with programming languages). If I know more or less what feature I want I can find it in the standard documentation (if I don’t I would complain about non-overlapping multiple &mut [range] borrows not working instead of using slice.split_at_mut()—and I did) but it does not really tell me what I should be looking for in the first place. I call it Excel complexity. In Excel there’s probably a function that does anything you want but it’s much easier to reimplement it yourself than to look up in the documentation how it’s called and what are its less obvious parameters. And even if you combine both The Rust Programming Language Second Edition and Rust By Example you still won’t get it right. Now that Rust aspires to be a JavaScript replacement it should take an example from it too: provide extensive overview how to do things in it instead of showcasing features. IMO in TRPLv2 there are two chapters—11 and 12—that are close to that ideal: they talk about testing and how to make a console program. In other words, good practical tasks that one would like to achieve with Rust (in other words, not so many people care about features per se, they want something done with a language: build multi-threaded application, parse Web server reply, make an efficient number cruncher etc etc). I can rant more about how it should be organised but nobody reads documentation including me.

There’s still this annoyance with tuples as such too: why I can’t declare let foo, bar; if baz { foo = 4; bar = 2; } else { foo = bar = 0; } and have to use two separate lets? why I can’t have let (foo, bar); if baz { (foo, bar) = (4, 2); } else { (foo, bar) = (0,0); } either? In result while named tuples are there I end up using only unnamed tuples.

So while Rust offers some nice things it has not a very nice way to shape development. And this also explains why C was so popular and still is: it does not enforce any particular behaviour on you (except in recent editions when the standard and compilers suddenly started to care about arithmetic and bit operations being non-portable—you might make your own CPU that does not use two’s complement arithmetic after all), no enforced coding style, you can compile code in any order you like and interface almost anything without special tools or wrappers. And the freedom it offered along with effectiveness is what is often lacking in more modern languages (the saddest thing is that it’s traded not for memory security but rather for sacks of syntactic sugar).

Anyway, I’ll keep experimenting and we’ll see how things will turn out. In either case I should start thinking about splitting NihAV into several crates, registering codecs and such. Too much work, too many opportunities to procrastinate!

Rust: Optimising Decoder Experience

Thursday, August 3rd, 2017

Okay, I’ve made some changes so hopefully the server will withstand the curiosity of more than two people if it will go like the last time.

So, after implementing Indeo 4/5 decoders for NihAV I nano-benchmarked it and my decoder was about twice as slow compared to libavcodec. And since neither has SIMD optimisations they should be good enough to compare.

The tested file was 00186002.avi — Indeo 4 sample with scalability feature(i.e. luma is split into four bands and uses Haar wavelet to compose the output plane) and duration over ten minutes. The results I got will be given in Linux perf sample counts as those should be representative enough.

avconv — 13.4 seconds, 10K cycles. About 24% spent in luma plane recombination (with Haar wavelet), about 40% of time is taken by bitstream decoding and the rest is mostly transforms and motion compensation.

nihav-tool — 31.6 seconds, 20K cycles. 30% spend in luma plane recombination, 48% of time is taken by bitstream decoding, 11% is for motion compensation and the rest is mostly transforms. Or in samples: recombination — 9900 (against 3300 in libavcodec), bitstream decoding (dirty estimate, it includes some DSP functions inlined) — 15800 against
5600. Motion compensation — 3500 against 1700. Transforms — 1300 against 1500 (they are not equivalent though, my code only transforms the block and output costs are hidden in bitstream decoding). Overall, my code is consistently worse. Is there any way to optimise it a bit?
(more…)

Rust: Not So Great for Codec Implementing

Monday, July 31st, 2017

Disclaimer: obviously it’s my opinion, feel free to prove me wrong or just ignore.

Now I should qualify for zoidberg (slang name for lowly programmer in Rust who lives somewhere in a dumpster and who is also completely ignored—perfect definition for me) I want to express my thoughts about programming experience with Rust. After all, NihAV was restarted to find out how modern languages fare for my favourite task and there was about one language that was promising enough. So here’s a short rant about the aspects of this programming language that I found good and not so good.

Good things

  • Modern language features: standard library containers, generics, units and their visibility etc etc. And at least looks like Rust won’t degrade into metaprogramming language any time soon (that’s left for upcoming Rust+=1 programming language);
  • Reasonable encapsulation: I mean both (sub)modules organisation and the fact that functions can be implemented just for some structure;
  • Powerful enums that can act both as plain C set of values and also as tagged objects, e.g. the standard Result enum has two values—Ok(result) and Err(error) where both result and error are two different user-defined types, so returned value can contain either while being the same type (Result);
  • More helpful error messages (e.g. it tries to suggest a correction for mistyped variable name or explains an error a bit more detailed). Sure, Real Programmers™ don’t need that but it’s still nice;
  • No need for dependency resolving: you can have stuff in one module referencing stuff in another module and vice versa at the same time, same for no need
  • Traits (standard interfaces for objects) and the fact that operations are implemented as specific traits (i.e. if you need to have a + b with your custom object you can implement std::ops::Add for it and it will work). Also it’s nice to extend functionality of some object by making an implementation for some trait: e.g. my bitstream reader is defined in one place but in another module I made another trait for it for reading codebooks so I can invoke let val = bitread.read_codebook(&cb)?; later.

Unfortunately, it’s not all rosy and peachy, Rust has some things that irritate me. Some of them are following from the advantages (i.e. you pay for many features with compilation time) and other are coming from language design or implementation complexity.

Irritating things that can probably be fixed

  • Compilation time is too large IMO. While the similar code in Libav is recompiled in less than a second, NihAV (test configuration) is built in about ten seconds. And any time above five seconds is irritating to wait. I understand why it is so and I hope it will be improved in the future but for now it’s irritating;
  • And, on the similar note, benchmarks. While overall built-in testing capabilities in Rust are good (file it under good things too), the fact that benchmarking is available only for limbo nightly Rust is annoying;
  • No control over allocation. On one hoof I like that I can not worry about it, on the other hoof I’d like to have an ability to handle it.
  • Poor primitive types functionality. If you claim that Rust is systems programming language then you should care more about primitive types than just relying on as keyword. If you care about systems programming and safety you’d have at least one or two functions to convert type into a smaller one (e.g. i16/u16 -> u8) and/or check whether the result fits. That’s one of the main annoyances when writing codecs: you often have to convert result into byte with range clipping;
  • Macros system is lacking. It’s great for code but if you want to use macros to have more compact data representation—tough luck. For example, in Indeo3 codebooks have sequences like (a,b), (-a,-b), (b,a), (-b,-a) which would be nice to shorten with a macro. But the best solution I saw in Rust was to declare whole array in a macro using token tree manipulation for proper submacro expansion. And I fear it might be the similar story with implementing motion compensation functions where macros are used generate required functions for specific block sizes and operations (simple put or average). I’ve managed to work it around a bit in one case with lambdas but it might not work so well for more complex motion compensation functions;
  • Also the tuple assignments. I’d like to be able to assign multiple variables from a tuple but it’s not possible now. And maybe it would be nice to be able to declare several variables with one let;
  • There are many cases where compiler could do the stuff automatically. For example, I can’t take a pointer to const but if I declare another const as a pointer to the first one it works fine. In my opinion compiler should be able to generate an intermediate second constant (if needed) by itself. Same for function calling—why does bitread.seek(bitread.tell() - 42); fail borrow check while let pos = bitread.tell() - 42; bitread.seek(pos); doesn’t?
  • Borrow checker and arrays. Oh, borrow checker and arrays.

This is probably the main showstopper for implementing complex video codecs in Rust effectively. Rust is anti-FORTRAN in a sense that FORTRAN was all about arrays and could operate arrays safely while Rust safely prevents you from operating arrays.

Video codecs usually operate on planes and there you’d like to operate with different chunks of the frame buffer (or plane) at the same time. Rust does not allow you to mutably borrow parts of the same array even when it should be completely safe like let mut a = &mut arr[0..pivot]; let mut b = &mut arr[pivot..];. Don’t tell me about ChunksMut, it does not allow you to work with them both simultaneously. And don’t tell me about Bytes crate—it should not be a separate crate, it should be a core language functionality. In result I have to resort to using indices inside frame buffer and Rc<RefCell<...>> for frames themselves. And only dream about being able to invoke mem::swap(&mut arr[idx1], &arr[idx2]);.

Update: so there’s slice::split_at_mut() which does some of the things I want, thanks Tomas for pointing it out.

And it gets even more annoying when I try to initialise an array of codebooks for further user. The codebook structure does not implement Clone because there’s no good reason for it to be cloned or copied around, but when I initialise an array of them I cannot simply declare it and fill the contents in a loop, I have to resort to unsafe { arr = mem::uninitialized(); for i in 0..arr.len() { ptr::write(&arr[i], Codebook::new(...); } }. I know that if there’s an error creating new element compiler won’t be able to ensure that it drops only already initialised elements but it’s still a problem for compiler not being smart enough yet. Certain somebody had an idea of using generator to initialise arrays but I’m not sure even that will be implemented any time soon.

And speaking about cloning, why does compiler refuse to generate Clone trait for a structure that has a pointer to function?

And that’s why C is still the best language for systems programming—it still lets you to do what you mean (the problem is that most programmers don’t really know what they mean) without many magical incantations. Sure, it’s very good to have many common errors eliminated by design but when you can’t do basic things in a simple way then what it is good for?

Annoying things that cannot be fixed

  • type keyword. Since it’s a keyword it can’t be used as a variable name and many objects have type, you know. And it’s not always reasonable to give a longer name or rewrite using enum. Similar story with ref but I hardly ever need it for a variable name and ref_<something> works even better. Still, it would be better if language designers picked typedef instead of type;
  • Not being able to combine if let with some other condition (those nested conditions tend to accumulate rather fast);
  • Sometimes I fear that compilation time belongs to this category too.

Overall, Rust is not that bad and I’ll keep developing NihAV using it but keep in mind it’s still far from being perfect (maybe about as far as C but in a different direction).

P.S. I also find the phrase “rewrite in Rust” quite stupid. Rust seems to be significantly different from other languages, especially C, so while “Real Programmers can write FORTRAN program in any language” it’s better to use new language features to redesign interfaces and make new overall design instead of translating the same mistakes from the old code. That’s why NihAV will lurch where somebody might have stepped before but not necessarily using the existing roads.