Archive for the ‘Rust’ Category

NihAV: Progress Report

Monday, July 2nd, 2018

I’m still working (barely) on NihAV and I’ve managed to make my code decode both RealVideo 3 and 4. It’s not always correct, especially B-frames and some corner cases, but at least it produces a sane picture in most cases.

And this time I’d like to write about disadvantages of writing motion compensation functions in Rust instead of C.

NihAV: progress report

Sunday, June 10th, 2018

Well, since I had no incentive to work on NihAV and recently the weather is not very encouraging for any kind of intellectual activity there was almost no progress. And yet now I have something to write about: NihAV has finally managed to decode non-trivial (i.e not fully black) RealVideo 3 I-frame properly (i.e. without any visible distortions). Loop filter is still missing but it’s a start. And it’s not a small feat considering one has to implement both coefficients decoding and intra prediction. So essentially it’s just motion vector juggling and motion compensation are all the things that are missing for P- and B-frames support. Maybe it will go faster from here (but most likely not).

And since doing that involved rewriting some C code into Rust here are some notes on how oxidising went:

  • match is a nice replacement for the cases when you have to partly remap values—in my case I had to adjust intra prediction directions in case top or left or bottom reference were missing and that means changing three or four values into other values, match looks more compact than several } else if (itype == FOO) { and does not lose readability either;
  • while in C foo = bar = 42; is a common thing, Rust does not allow this (I can understand why) and I’m surprised I ran into it only now (with intra prediction functions that assign the same calculated value to several output pixels at once);
  • loops in Rust are fine for basic use but when you need to deal with something more complex like for (i = 0; i < block_size; i += 4) or for (i = 99; i > 0; i--) you need either to write a simpler loop and remap indices inside or to remember it’s Rust and permute range in less intuitive ways like for i in (0..block_size).filter(|x| x&3 == 0) and for i in (1..99+1).rev(). While this works and even somewhat conveys the meaning it’s a bit unwieldy IMO;
  • and it might be a bit too esoteric but looks like I cannot easily write fn clip_u8(val: N) -> u8 that would take any primitive numeric type as input, do comparisons inside and return value either clipped to converted to u8. The best answer on how to do it I found was “you can’t, it’s against Rust practices”. I don’t need it much and I care even less, so I’ll just mark it as a neutral language feature and forget about it.

And now the small but constantly irritating thing: arrays. While slices are nice and easy to use (including extracting sub-slices), in my area I often need a slice with arbitrary start and end bounds. To clarify my use case: quite often you need a piece of memory that’s addressable with both positive and negative indices and those make sense on certain interval.

One of such common arrays is clipping array which essentially takes input index and returns it clipped usually to 0-255 range. So you have part [-255..-1] filled with zeroes, [0..255] filled with values in the same range and [256..511] filled with 255. I repeat, such clipping table is very common and useful thing that’s currently not easy to implement in Rust.

Another less common case is the block of pixels we process may require information from its top, left and top-left neighbours—and those are addressed as src[-stride + i], src[-1 + stride*i] and src[-stride - 1]. Or a whole frame of GDI-related codec (no, not from Westwood) or even simple BMP/DIB that stores lines upside-down so after you process line 0 you have to move to line -1.

I currently deal with it by keeping an additional variable pointing to the current position in array that I use as a reference and from which I can subtract other numbers if needed, but it’s a bit clunky and error-prone. Since Rust checks indices on slice access I wonder if extending it to work with e.g. negative indices is possible. IIRC FORTRAN and Pascal allowed you to define an array starting with arbitrary index, it might be possible in Rust too.

Oh well, I’ll just keep using my approach meanwhile and waiting to see what rust-av does in this regard.

Rust: Lifetimes Sugar

Sunday, May 27th, 2018

One of the Rust language features is explicit object lifetimes that help compiler correctly track memory usage and free objects without using garbage collector. A neat idea but it leads to lifetime specifiers being used everywhere including places where compiler should be smart enough to deal with them without explicit mentions in every place.

Maybe I’m using Rust wrong but in most of the cases I create objects that have no need for lifetime specifier or the objects that have the same lifetime for both its members and itself. Thus I argue that in addition to generic lifetime specifier 'a (or whatever the name you give it) and obviously named 'static there should be 'self that specifies the lifetime to be exactly the same as the object itself.

So, instead of current:

struct Foo<'a> {
  myref: &'a [u8],
  subobj: Bar<'a>,

impl<'a> Foo<'a> {
  pub fn new(myref: &'a [u8], subobj: Bar<'a>) -> Self { ... }

it should be possible to write:

struct Foo {
  myref: &'self [u8],
  subobj: Bar,

impl Foo {
  pub fn new(myref: &'self [u8], subobj: Bar) -> Self { ... }

I am not sure whether compiler needs to perform some additional things in such objects compared to objects without no lifetime specifier but it should be easy to assign proper lifetime after parsing the structure definition anyway and I’m pretty sure the compiler does something like this anyway.

And I see only these reasons why this has not been done yet:

  • Considerations for compiler simplicity (i.e. parsing process should be kept as simple as possible)—I still think it should be easy for compiler to recognize the lifetime definition by the time structure declaration parsing is over and it’s used externally (i.e. for objects using this one);
  • Considerations for language clarity and consistency (i.e. it’s immediately obvious when you look at the object that it deals with lifetimes but not with the proposed change). I’d argue that explicit lifetimes should be kept for complex cases only, when you have to juggle lifetimes from several complex sources, and the objects with references not outliving themselves should be fine;
  • Simple oversight (i.e. “we did not think of such simplification”) or developers’ bias (i.e. “we got used to writing lifetime specifiers everywhere that we didn’t think it annoys anybody”). You should be able to guess what I have to say about such argument.

So all in all I’d be happy to either hear why it cannot be done (beside the compatibility with the existing code) or see it implemented. But most likely this will be ignored (and I’m fine with that too).

Rust in multimedia: unwieldy features

Sunday, March 18th, 2018

Today I wanted to talk about two features that are quite important for multimedia decoding but are quite inconvenient in the current state.

First, macros. I know that macros in Rust are both very powerful and quite flexible but they are hard to use for data definition and I ranted about it before. The problem is that quite often you have tables with some internal structure that would benefit from macro substitutions: if you have a codebook constructed from entries following patterns like a, b, -a, -b and a, b, a, -b, -a, b, -a, -b it would be easier and less error-prone to represent them as e.g. FLIP2#(a, b) and FLIP4#(a, b) inside the data definition. The problem is that macro! does not allow you to do that easily since it’s supposed to expand into valid statements (i.e. code or full data definitions). Of course you can work it around by making a set of macros to define the whole array and some bits inside it but that’s what makes it unwieldy. And that’s why I believe there should be another macro substitution mechanism, maybe named macro#, that would work just on data but it’d be much easier to use in that particular case.

The second issue is assembly integration. Despite Rust being fast and such it’s still better to write small critical functions in assembly. And obviously it would be better if Cargo supported including assembler files into crate. You can point out there’s stdsimd for using the power of SIMD without much hassle. I can point out that compiler-generated code is still far from being perfect even with intrinsics and assembly is still better; supporting querying SIMD capabilities via standard package is good though. And you can point out that there’s a special crate for supporting various files with various compilers/assemblers already. I’d say that it’s a bit too generic but at least it can serve as a base for what I need. Again, there’s more or less standard way to deal with assembly files so making a common standard is not hard.

And in the unlikely case somebody reads this and asks why I don’t form an RFC—from what I heard it involves proposing code as well and I don’t want to study the compiler nor waste days compiling it.

Rust: Annoyance-Driven Design

Sunday, December 3rd, 2017

I’ve finally made NihAV decode RealVideo 2 content, including B-frames (there are still 4 video codecs to support (and I don’t have any samples for RMHD) and all audio codecs too so it’s a long way) and so I have some more words to say about Rust and my experience with it.

To me it looks like the most decisions on decompositions in Rust are the consequences of annoyance of making it other way? Too large structures mean you have to either pass too many arguments into new() or fill it with some defaults (and I’m pretty sure that #derive[Default] won’t save you with more complex types) and initialise to sane values later. In result it’s easier to split everything into smaller structures which are (at least) subjectively are much easier to handle, especially if you reference them as Option<YourStruct>. Modules and imports, on the other hoof, are more annoying to manage since you have to take care of proper dependencies, visibility and imports—in result I find it easier to import all stuff from all modules and just keep comment out currently unused imports (because I still can’t bring myself to make it all a single mega-module). And now for the even higher level: crates. Yes, I’m going to beat that undead horse again.

First of all, I’m aware of incremental building enabled in nocturnal Rust but I’m not going to use nightly for rather obvious reasons (mostly because I’m not here to experiment with the all potential bells and whistles of the language but rather what it can offer right out of the box and how it suits my needs). So, the compilation times are horrible: when I change a single non-public function it rebuilds the whole crate (which is supposed behaviour, I know) and it takes 15 seconds to do that. Obviously it’s laughable for people doing “serious” projects but it’s basic fact that humans expect response (any response) in about five seconds after the action or they get impatient. In result instead of one crate with optional features (in my case decoders and demuxers) I’d rather have several smaller crates and that creates new issues too. There’s this obvious npm.js kind of issue of making packages for every small thing so your programs ends with more package dependencies than modern Linux distribution. But there’s also the issue with package splitting: I’d like to split my code into packages that encompass certain family of features—e.g. nihav-core for common stuff, nihav-avi for AVI demuxer, nihav-indeo for all Indeo codecs (audio and video) and nihav-realmedia for RealMedia demuxer and related codecs—then some of them may depend on some common package (like H.263 common core for Intel I.263 and RealVideo 1 and 2 decoders) but probably with different features requested (one of them does not need B-frame support, another one does not need PB-frame support). Since I don’t know quantum cargodynamics I don’t know how it will all be resolved. So it will either end in dead code or code duplication (in an additional crate too, I suppose).

My theory is that people behind Rust are biased by their development environment. In other words you don’t care much about compilation times when you have to build browsers (or compilers) on daily basis. While my main development machine is a laptop I bought in 2010 with 8GB of RAM (which I believed to be future-proof). So the Rust language designers might either have beefy machines to deal with fast compilation or be conditioned to long development cycles. I know that back in the day “start compiling Linux kernel and go make some coffee to pass 45 minutes of compilation time” was quite common but I guess it’s Jevons’ paradox all over again: the more computing power is there the more it’s wasted on compilation times. Like modern C++ or single-header libraries: you actually have to compile a very large corpus of code as single file. Back in the days my laptop with 64MB RAM was spending most of the time compiling libavcodec/dsputil.c (a monstrous file full of templates that old FFmpeg developers might remember even today) so I had to install more RAM in order to make compilation time reasonable. The solution was to split the file instead of upgrading the machines for every developer but nowadays it’d be seen as a ridiculous solution.

And now documentation. I find it rather poor (but that’s common with programming languages). If I know more or less what feature I want I can find it in the standard documentation (if I don’t I would complain about non-overlapping multiple &mut [range] borrows not working instead of using slice.split_at_mut()—and I did) but it does not really tell me what I should be looking for in the first place. I call it Excel complexity. In Excel there’s probably a function that does anything you want but it’s much easier to reimplement it yourself than to look up in the documentation how it’s called and what are its less obvious parameters. And even if you combine both The Rust Programming Language Second Edition and Rust By Example you still won’t get it right. Now that Rust aspires to be a JavaScript replacement it should take an example from it too: provide extensive overview how to do things in it instead of showcasing features. IMO in TRPLv2 there are two chapters—11 and 12—that are close to that ideal: they talk about testing and how to make a console program. In other words, good practical tasks that one would like to achieve with Rust (in other words, not so many people care about features per se, they want something done with a language: build multi-threaded application, parse Web server reply, make an efficient number cruncher etc etc). I can rant more about how it should be organised but nobody reads documentation including me.

There’s still this annoyance with tuples as such too: why I can’t declare let foo, bar; if baz { foo = 4; bar = 2; } else { foo = bar = 0; } and have to use two separate lets? why I can’t have let (foo, bar); if baz { (foo, bar) = (4, 2); } else { (foo, bar) = (0,0); } either? In result while named tuples are there I end up using only unnamed tuples.

So while Rust offers some nice things it has not a very nice way to shape development. And this also explains why C was so popular and still is: it does not enforce any particular behaviour on you (except in recent editions when the standard and compilers suddenly started to care about arithmetic and bit operations being non-portable—you might make your own CPU that does not use two’s complement arithmetic after all), no enforced coding style, you can compile code in any order you like and interface almost anything without special tools or wrappers. And the freedom it offered along with effectiveness is what is often lacking in more modern languages (the saddest thing is that it’s traded not for memory security but rather for sacks of syntactic sugar).

Anyway, I’ll keep experimenting and we’ll see how things will turn out. In either case I should start thinking about splitting NihAV into several crates, registering codecs and such. Too much work, too many opportunities to procrastinate!

Rust: Optimising Decoder Experience

Thursday, August 3rd, 2017

Okay, I’ve made some changes so hopefully the server will withstand the curiosity of more than two people if it will go like the last time.

So, after implementing Indeo 4/5 decoders for NihAV I nano-benchmarked it and my decoder was about twice as slow compared to libavcodec. And since neither has SIMD optimisations they should be good enough to compare.

The tested file was 00186002.avi — Indeo 4 sample with scalability feature(i.e. luma is split into four bands and uses Haar wavelet to compose the output plane) and duration over ten minutes. The results I got will be given in Linux perf sample counts as those should be representative enough.

avconv — 13.4 seconds, 10K cycles. About 24% spent in luma plane recombination (with Haar wavelet), about 40% of time is taken by bitstream decoding and the rest is mostly transforms and motion compensation.

nihav-tool — 31.6 seconds, 20K cycles. 30% spend in luma plane recombination, 48% of time is taken by bitstream decoding, 11% is for motion compensation and the rest is mostly transforms. Or in samples: recombination — 9900 (against 3300 in libavcodec), bitstream decoding (dirty estimate, it includes some DSP functions inlined) — 15800 against
5600. Motion compensation — 3500 against 1700. Transforms — 1300 against 1500 (they are not equivalent though, my code only transforms the block and output costs are hidden in bitstream decoding). Overall, my code is consistently worse. Is there any way to optimise it a bit?

Rust: Not So Great for Codec Implementing

Monday, July 31st, 2017

Disclaimer: obviously it’s my opinion, feel free to prove me wrong or just ignore.

Now I should qualify for zoidberg (slang name for lowly programmer in Rust who lives somewhere in a dumpster and who is also completely ignored—perfect definition for me) I want to express my thoughts about programming experience with Rust. After all, NihAV was restarted to find out how modern languages fare for my favourite task and there was about one language that was promising enough. So here’s a short rant about the aspects of this programming language that I found good and not so good.

Good things

  • Modern language features: standard library containers, generics, units and their visibility etc etc. And at least looks like Rust won’t degrade into metaprogramming language any time soon (that’s left for upcoming Rust+=1 programming language);
  • Reasonable encapsulation: I mean both (sub)modules organisation and the fact that functions can be implemented just for some structure;
  • Powerful enums that can act both as plain C set of values and also as tagged objects, e.g. the standard Result enum has two values—Ok(result) and Err(error) where both result and error are two different user-defined types, so returned value can contain either while being the same type (Result);
  • More helpful error messages (e.g. it tries to suggest a correction for mistyped variable name or explains an error a bit more detailed). Sure, Real Programmers™ don’t need that but it’s still nice;
  • No need for dependency resolving: you can have stuff in one module referencing stuff in another module and vice versa at the same time, same for no need
  • Traits (standard interfaces for objects) and the fact that operations are implemented as specific traits (i.e. if you need to have a + b with your custom object you can implement std::ops::Add for it and it will work). Also it’s nice to extend functionality of some object by making an implementation for some trait: e.g. my bitstream reader is defined in one place but in another module I made another trait for it for reading codebooks so I can invoke let val = bitread.read_codebook(&cb)?; later.

Unfortunately, it’s not all rosy and peachy, Rust has some things that irritate me. Some of them are following from the advantages (i.e. you pay for many features with compilation time) and other are coming from language design or implementation complexity.

Irritating things that can probably be fixed

  • Compilation time is too large IMO. While the similar code in Libav is recompiled in less than a second, NihAV (test configuration) is built in about ten seconds. And any time above five seconds is irritating to wait. I understand why it is so and I hope it will be improved in the future but for now it’s irritating;
  • And, on the similar note, benchmarks. While overall built-in testing capabilities in Rust are good (file it under good things too), the fact that benchmarking is available only for limbo nightly Rust is annoying;
  • No control over allocation. On one hoof I like that I can not worry about it, on the other hoof I’d like to have an ability to handle it.
  • Poor primitive types functionality. If you claim that Rust is systems programming language then you should care more about primitive types than just relying on as keyword. If you care about systems programming and safety you’d have at least one or two functions to convert type into a smaller one (e.g. i16/u16 -> u8) and/or check whether the result fits. That’s one of the main annoyances when writing codecs: you often have to convert result into byte with range clipping;
  • Macros system is lacking. It’s great for code but if you want to use macros to have more compact data representation—tough luck. For example, in Indeo3 codebooks have sequences like (a,b), (-a,-b), (b,a), (-b,-a) which would be nice to shorten with a macro. But the best solution I saw in Rust was to declare whole array in a macro using token tree manipulation for proper submacro expansion. And I fear it might be the similar story with implementing motion compensation functions where macros are used generate required functions for specific block sizes and operations (simple put or average). I’ve managed to work it around a bit in one case with lambdas but it might not work so well for more complex motion compensation functions;
  • Also the tuple assignments. I’d like to be able to assign multiple variables from a tuple but it’s not possible now. And maybe it would be nice to be able to declare several variables with one let;
  • There are many cases where compiler could do the stuff automatically. For example, I can’t take a pointer to const but if I declare another const as a pointer to the first one it works fine. In my opinion compiler should be able to generate an intermediate second constant (if needed) by itself. Same for function calling—why does - 42); fail borrow check while let pos = bitread.tell() - 42;; doesn’t?
  • Borrow checker and arrays. Oh, borrow checker and arrays.

This is probably the main showstopper for implementing complex video codecs in Rust effectively. Rust is anti-FORTRAN in a sense that FORTRAN was all about arrays and could operate arrays safely while Rust safely prevents you from operating arrays.

Video codecs usually operate on planes and there you’d like to operate with different chunks of the frame buffer (or plane) at the same time. Rust does not allow you to mutably borrow parts of the same array even when it should be completely safe like let mut a = &mut arr[0..pivot]; let mut b = &mut arr[pivot..];. Don’t tell me about ChunksMut, it does not allow you to work with them both simultaneously. And don’t tell me about Bytes crate—it should not be a separate crate, it should be a core language functionality. In result I have to resort to using indices inside frame buffer and Rc<RefCell<...>> for frames themselves. And only dream about being able to invoke mem::swap(&mut arr[idx1], &arr[idx2]);.

Update: so there’s slice::split_at_mut() which does some of the things I want, thanks Tomas for pointing it out.

And it gets even more annoying when I try to initialise an array of codebooks for further user. The codebook structure does not implement Clone because there’s no good reason for it to be cloned or copied around, but when I initialise an array of them I cannot simply declare it and fill the contents in a loop, I have to resort to unsafe { arr = mem::uninitialized(); for i in 0..arr.len() { ptr::write(&arr[i], Codebook::new(...); } }. I know that if there’s an error creating new element compiler won’t be able to ensure that it drops only already initialised elements but it’s still a problem for compiler not being smart enough yet. Certain somebody had an idea of using generator to initialise arrays but I’m not sure even that will be implemented any time soon.

And speaking about cloning, why does compiler refuse to generate Clone trait for a structure that has a pointer to function?

And that’s why C is still the best language for systems programming—it still lets you to do what you mean (the problem is that most programmers don’t really know what they mean) without many magical incantations. Sure, it’s very good to have many common errors eliminated by design but when you can’t do basic things in a simple way then what it is good for?

Annoying things that cannot be fixed

  • type keyword. Since it’s a keyword it can’t be used as a variable name and many objects have type, you know. And it’s not always reasonable to give a longer name or rewrite using enum. Similar story with ref but I hardly ever need it for a variable name and ref_<something> works even better. Still, it would be better if language designers picked typedef instead of type;
  • Not being able to combine if let with some other condition (those nested conditions tend to accumulate rather fast);
  • Sometimes I fear that compilation time belongs to this category too.

Overall, Rust is not that bad and I’ll keep developing NihAV using it but keep in mind it’s still far from being perfect (maybe about as far as C but in a different direction).

P.S. I also find the phrase “rewrite in Rust” quite stupid. Rust seems to be significantly different from other languages, especially C, so while “Real Programmers can write FORTRAN program in any language” it’s better to use new language features to redesign interfaces and make new overall design instead of translating the same mistakes from the old code. That’s why NihAV will lurch where somebody might have stepped before but not necessarily using the existing roads.