Archive for the ‘Useless Rants’ Category

Rust: Optimising Decoder Experience

Thursday, August 3rd, 2017

Okay, I’ve made some changes so hopefully the server will withstand the curiosity of more than two people if it will go like the last time.

So, after implementing Indeo 4/5 decoders for NihAV I nano-benchmarked it and my decoder was about twice as slow compared to libavcodec. And since neither has SIMD optimisations they should be good enough to compare.

The tested file was 00186002.avi — Indeo 4 sample with scalability feature(i.e. luma is split into four bands and uses Haar wavelet to compose the output plane) and duration over ten minutes. The results I got will be given in Linux perf sample counts as those should be representative enough.

avconv — 13.4 seconds, 10K cycles. About 24% spent in luma plane recombination (with Haar wavelet), about 40% of time is taken by bitstream decoding and the rest is mostly transforms and motion compensation.

nihav-tool — 31.6 seconds, 20K cycles. 30% spend in luma plane recombination, 48% of time is taken by bitstream decoding, 11% is for motion compensation and the rest is mostly transforms. Or in samples: recombination — 9900 (against 3300 in libavcodec), bitstream decoding (dirty estimate, it includes some DSP functions inlined) — 15800 against
5600. Motion compensation — 3500 against 1700. Transforms — 1300 against 1500 (they are not equivalent though, my code only transforms the block and output costs are hidden in bitstream decoding). Overall, my code is consistently worse. Is there any way to optimise it a bit?

Rust: Not So Great for Codec Implementing

Monday, July 31st, 2017

Disclaimer: obviously it’s my opinion, feel free to prove me wrong or just ignore.

Now I should qualify for zoidberg (slang name for lowly programmer in Rust who lives somewhere in a dumpster and who is also completely ignored—perfect definition for me) I want to express my thoughts about programming experience with Rust. After all, NihAV was restarted to find out how modern languages fare for my favourite task and there was about one language that was promising enough. So here’s a short rant about the aspects of this programming language that I found good and not so good.

Good things

  • Modern language features: standard library containers, generics, units and their visibility etc etc. And at least looks like Rust won’t degrade into metaprogramming language any time soon (that’s left for upcoming Rust+=1 programming language);
  • Reasonable encapsulation: I mean both (sub)modules organisation and the fact that functions can be implemented just for some structure;
  • Powerful enums that can act both as plain C set of values and also as tagged objects, e.g. the standard Result enum has two values—Ok(result) and Err(error) where both result and error are two different user-defined types, so returned value can contain either while being the same type (Result);
  • More helpful error messages (e.g. it tries to suggest a correction for mistyped variable name or explains an error a bit more detailed). Sure, Real Programmers™ don’t need that but it’s still nice;
  • No need for dependency resolving: you can have stuff in one module referencing stuff in another module and vice versa at the same time, same for no need
  • Traits (standard interfaces for objects) and the fact that operations are implemented as specific traits (i.e. if you need to have a + b with your custom object you can implement std::ops::Add for it and it will work). Also it’s nice to extend functionality of some object by making an implementation for some trait: e.g. my bitstream reader is defined in one place but in another module I made another trait for it for reading codebooks so I can invoke let val = bitread.read_codebook(&cb)?; later.

Unfortunately, it’s not all rosy and peachy, Rust has some things that irritate me. Some of them are following from the advantages (i.e. you pay for many features with compilation time) and other are coming from language design or implementation complexity.

Irritating things that can probably be fixed

  • Compilation time is too large IMO. While the similar code in Libav is recompiled in less than a second, NihAV (test configuration) is built in about ten seconds. And any time above five seconds is irritating to wait. I understand why it is so and I hope it will be improved in the future but for now it’s irritating;
  • And, on the similar note, benchmarks. While overall built-in testing capabilities in Rust are good (file it under good things too), the fact that benchmarking is available only for limbo nightly Rust is annoying;
  • No control over allocation. On one hoof I like that I can not worry about it, on the other hoof I’d like to have an ability to handle it.
  • Poor primitive types functionality. If you claim that Rust is systems programming language then you should care more about primitive types than just relying on as keyword. If you care about systems programming and safety you’d have at least one or two functions to convert type into a smaller one (e.g. i16/u16 -> u8) and/or check whether the result fits. That’s one of the main annoyances when writing codecs: you often have to convert result into byte with range clipping;
  • Macros system is lacking. It’s great for code but if you want to use macros to have more compact data representation—tough luck. For example, in Indeo3 codebooks have sequences like (a,b), (-a,-b), (b,a), (-b,-a) which would be nice to shorten with a macro. But the best solution I saw in Rust was to declare whole array in a macro using token tree manipulation for proper submacro expansion. And I fear it might be the similar story with implementing motion compensation functions where macros are used generate required functions for specific block sizes and operations (simple put or average). I’ve managed to work it around a bit in one case with lambdas but it might not work so well for more complex motion compensation functions;
  • Also the tuple assignments. I’d like to be able to assign multiple variables from a tuple but it’s not possible now. And maybe it would be nice to be able to declare several variables with one let;
  • There are many cases where compiler could do the stuff automatically. For example, I can’t take a pointer to const but if I declare another const as a pointer to the first one it works fine. In my opinion compiler should be able to generate an intermediate second constant (if needed) by itself. Same for function calling—why does - 42); fail borrow check while let pos = bitread.tell() - 42;; doesn’t?
  • Borrow checker and arrays. Oh, borrow checker and arrays.

This is probably the main showstopper for implementing complex video codecs in Rust effectively. Rust is anti-FORTRAN in a sense that FORTRAN was all about arrays and could operate arrays safely while Rust safely prevents you from operating arrays.

Video codecs usually operate on planes and there you’d like to operate with different chunks of the frame buffer (or plane) at the same time. Rust does not allow you to mutably borrow parts of the same array even when it should be completely safe like let mut a = &mut arr[0..pivot]; let mut b = &mut arr[pivot..];. Don’t tell me about ChunksMut, it does not allow you to work with them both simultaneously. And don’t tell me about Bytes crate—it should not be a separate crate, it should be a core language functionality. In result I have to resort to using indices inside frame buffer and Rc<RefCell<...>> for frames themselves. And only dream about being able to invoke mem::swap(&mut arr[idx1], &arr[idx2]);.

Update: so there’s slice::split_at_mut() which does some of the things I want, thanks Tomas for pointing it out.

And it gets even more annoying when I try to initialise an array of codebooks for further user. The codebook structure does not implement Clone because there’s no good reason for it to be cloned or copied around, but when I initialise an array of them I cannot simply declare it and fill the contents in a loop, I have to resort to unsafe { arr = mem::uninitialized(); for i in 0..arr.len() { ptr::write(&arr[i], Codebook::new(...); } }. I know that if there’s an error creating new element compiler won’t be able to ensure that it drops only already initialised elements but it’s still a problem for compiler not being smart enough yet. Certain somebody had an idea of using generator to initialise arrays but I’m not sure even that will be implemented any time soon.

And speaking about cloning, why does compiler refuse to generate Clone trait for a structure that has a pointer to function?

And that’s why C is still the best language for systems programming—it still lets you to do what you mean (the problem is that most programmers don’t really know what they mean) without many magical incantations. Sure, it’s very good to have many common errors eliminated by design but when you can’t do basic things in a simple way then what it is good for?

Annoying things that cannot be fixed

  • type keyword. Since it’s a keyword it can’t be used as a variable name and many objects have type, you know. And it’s not always reasonable to give a longer name or rewrite using enum. Similar story with ref but I hardly ever need it for a variable name and ref_<something> works even better. Still, it would be better if language designers picked typedef instead of type;
  • Not being able to combine if let with some other condition (those nested conditions tend to accumulate rather fast);
  • Sometimes I fear that compilation time belongs to this category too.

Overall, Rust is not that bad and I’ll keep developing NihAV using it but keep in mind it’s still far from being perfect (maybe about as far as C but in a different direction).

P.S. I also find the phrase “rewrite in Rust” quite stupid. Rust seems to be significantly different from other languages, especially C, so while “Real Programmers can write FORTRAN program in any language” it’s better to use new language features to redesign interfaces and make new overall design instead of translating the same mistakes from the old code. That’s why NihAV will lurch where somebody might have stepped before but not necessarily using the existing roads.

A Bit about Airlines

Tuesday, July 25th, 2017

I did not want to have personal rants in my restarted blog but sometimes material just comes and presents itself.

As some of you might know, I prefer travelling by rail; yet sometimes I travel by plane because it’s faster. Most of those flights are semiannual flights to Sweden and an occasional flight to elenril-city. And here’s the list of unpleasant things I had with flights:

  • Planes being late for more than an hour (because of technical reasons) — two Lufthansa flights from Arlanda;
  • Plane being late just because — Aerosvit, once;
  • Baggage not loaded on plane — Cimber Sterling (aka Danish Aerosvit);
  • Flights cancelled because of strike — once SAS and once Lufthansa;
  • Flight being cancelled because of plane malfunction — Lufthansa once;
  • Flight being cancelled because they didn’t want to wait for the passengers — Lufthansa once (yup, people were waiting at the gate but they decided to skip boarding entirely and send the plane away without passengers);
  • Flight where I could not check in — Lufthansa once.

To repeat myself, most flights I make during the year are with SAS to/from Sweden though sometimes segment is operated by LH. So far return trips to Frankfurt with LH were mostly okay except for some delays but the last “flight” was something different.

I booked a flight FRA-PRG-FRA. The flight to Praha was cancelled because plane arrived to Frankfurt at least half an hour later than expected and after another hour it was decided it’s not good enough to fly again. Okay. So they could not find a replacement plane and rebooked me to flight at 22:15. Fine, but it turned out that I could reach Praha by train faster and cheaper (twice as cheap actually) so I decided not to wait.

Then the time for return flight came and I could not check in at all because they have modified something (that’s the message: “Cannot check in to your flight because of modifications, refer to Lufthansa counter.” And there’s no LH representative there. And if you can withstand their call centre, you’re a much better person than I am). So it was another train back to Germany (which also broke down in the middle of nowhere but at least it was resolved in an hour and a half). Maybe it’s because of the selected Cattle Lowcostish fare (Economy but without check-in baggage or seat selection) instead of the usual one but at least with SAS when I wasn’t able to take flight to Arlanda (because of Frankfurt Airport staff strike) I still had no problems flying back from there.

Call me picky (and I shan’t argue, I am picky) but I expect better statistics because the most irritating cases were happening with the certain company that I don’t fly with often and that’s comparable with SAS in quantity (but, sadly, not quality).

And that means I’ll avoid using it in the future even if that means not being able to get to some places by plane in reasonable time. There are still trains for me.

P.S. This rant is just to vent off my anger and frustration from the recent experience. And it should make me remember not to take Air Allemagne flights ever again.
P.P.S. Hopefully the next post will be more technical.

Why Modern Video Codecs Suck and Will Keep on Sucking

Friday, May 12th, 2017

If you look at the modern video codecs you’ll spot one problem: they get designed for large resolutions and follow one-size-does-not-fit-exactly-anybody approach. By that I mean that codecs are following the model introduced by ITU H.261—split image into blocks, predict block from the previous frame if possible, apply DCT, quantise and code resulting coefficients (using zigzag scan order and special treatment for runs of zeroes). The same was later applied to pictures in JPEG format that is still staying strong.

Of course modern codecs are much more complex that that, current ITU H.EVC standard enhanced every stage:

  • image is no longer split into 8×8 blocks, you have quadtrees coding blocks from 64×64 down to 4×4 pixels;
  • block prediction got more complicated, now you have intra (or spatial prediction) that tries to fill block with gradient derived from already decoded neighbour blocks) and inter prediction (the old prediction from the previous frame);
  • and obviously inter prediction is not that simple either: now it’s decoupled from transformed block and can have completely different sizes (like 16×4 or 24×32), instead of single previous frame you can use two reference frames selected from two separate lists of references and even motion vectors are often predicted using motion vectors from the reference frames (does anybody like implementing those colocated MV prediction modes BTW?);
  • DCT is replaced with some bitexact integer approximations (and the dequantisation and/or transform stages may be skipped completely);
  • there are more scan types used and all values are coded using some context-adaptive coder.

Plus some hacks for low-resolution mode (e.g. special 4×4 transform for luma), lossless (or as they call it, “PCM coding”) and now also special coding mode for screen content (i.e. images with fewer distinct colours and where fine details matter).

The enhancements on streamline coding process are enhancements, they don’t change principles of coding but rather adapt them to modern conditions (meaning that there’s demand in higher compression and there’s more CPU power and RAM can be thrown at the processing—mostly RAM though).

And what the hacks do? They try to deal with the fact that this model works fine for smooth changing continuous tone images and it does not work that good on other types of video source. There are several ways to deal with the problem but keep in mind that the problem of distinguishing video types and selecting proper coding is AI-complete:

  1. JPEG+PNG approach. You select best coder for the source manually and transmit it like that. Obviously it works well in limited scenarios but even people quite often don’t bother and compress everything with the single format even if that hurts quality or compression ratio. Plus you need to handle two different formats, make sure that the receiving end also supports them etc etc.
  2. MPEG-4 approach. You have single format that has various “coding tools” embedded, they can be both full alternative coding features (like WebP has VP8 compression and lossless compression and nothing common between them or MPEG-4 Audio can be coded as conventional AAC, TwinVQ, speech codec or even as a description for synthesised audio) or various enhancement applied to the main coding method (like you have AAC-LC, AAC-Main that enables several features or HE-AACv2 which takes AAC-LC audio and applies SBR and Parametric Stereo to double its channels and frequency range). Actually there are more than forty various MPEG-4 Audio object types (various coding modes) already, do you think there’s any software that supports everything? And looks like modern video codecs head this way too: they introduce various coding tools (like for screen content) and it would be fun to support all possible features in the decoder. Please consider how much effort should be spent on effectively applying all those tools too (and that’s obviously beside the scope of standards).
  3. ZPAQ approach. The terminal AI-complete solution. You are not merely generating bitstream but first you need to transmit bytecode for a program that will decode this bytestream. It’s the ultimate solution—if you can describe the perfect model for the stream then you can compress it the best. Finding an optimal model for given bitstream is left as an exercise for the reader (in TAoCP it would be marked with M60 I guess).

The second thing I find sucky is combinatorial explosion of encoding parameters. Back in the day you had to worry about selecting the best quantisation matrix (or merely a quantiser) and motion vector if you decided to code it as inter-block. Now you have countless ways to split large tile into smaller blocks, many ways to select prediction mode (inter/intra, prediction angle for intra, partitioning, reference frames and motion vectors) and whether to skip transform stage or not and if not whether it’s worth to subdivide block further or not… The result is as good as string theory—you can get a good one if you can guess zillions of parameters right.

It would be nice to have encoder actually splitting video into scene and actors and transmitting just the changes to the objects (actors, scene) instead of blocks. But then you have a problem of coding those descriptions efficiently and even greater problem of automatically classifying the video into such objects (obviously software can do that, that’s why MPEG-4 Synthetic Video is such a great success). Actually it had some use: there was AVS-S standard for coding video specifically from surveillance cameras (why would China need such standard anyway?). In this standard there was special kind of frame for the whole scene and the main share of video was supposed to be just objects moving around the scene. Even if the standard is obsolete its legacy was included into HEVSAVS2 as three or four new special frame types.

Personally I believe that current video formats are being optimised to local minimum, there are probably other coding methods that give larger gain on certain kinds of data, preferably with less tweaking. For example, that was probably the best thing about Daala, its PVQ coding; the rest was nor crazy enough. I have a gut feeling that vector quantisation might be a good base for an alternative approach to building video codecs. And I think it’s better to have different formats oriented for e.g. low-latency broadcasting and video distributing. If you remember, back in the days people actually spent time to decide which segment was coded better with DivX ;-) 3 Fast-Motion or DivX ;-) 3 Low-Motion, so those who care will be able to select proper format. And the rest can keep watching content in VP11/AV2 format. Probably only the last sentence will come to life.

That’s why I don’t expect bright future in video codecs and that’s why my blog is titled like this.

NihAV Development Progress

Saturday, May 6th, 2017

After long considerations and much hesitation NihAV finally accepts it’s first developer (that would be me). And for a change it will be written in Rust. First, it’s an interesting language worth learning; second, it seems to offer needed functionality without much hassle; third, it offers new features that are tempting to try. By new features I mostly mean enums, traits and functions bound to structures.

Previously I expressed the intent to do a completely new design of multimedia (mostly decoding) framework with decoders being assembled from smaller blocks. For example, if I’d implement VIVO H.263 decoder (just as a troll) it would contain these bits:

  • generic 8×8 block decoder interface that does common stuff for such decoders (maintaining block indices, filling frame information e.g. block type, motion vectors etc etc);
  • trait for 8×8 block decoder implementation that does actual bitstream decoding (functions for decoding GOP/picture/slice headers, MV prediction, block data decoding and such);
  • IPB frame shuffler as implementation of generic frame shuffler (i.e. that piece of code that selects which frames to use as references and which frame to output after decoding the current one);
  • maybe even custom codebook accessor (the piece of code that tells codebook generator what is the code and symbol at position N) so it doesn’t need to be converted into some fixed form.

There’s not much code written yet but there are some bits implemented: rudimentary universal bitstream reader, bytestream reader (the same for memory and file I/O) and semi-working framework for demuxing. That’s a start at least.

Side rant: it seems that visible languages (i.e. not completely obscure ones) that use := form assignment have rather unpleasant evangelists (full of themselves in the best case, actively pushing their language to replace everything else in the worst case). That includes Go, Oberon and Wolfram Language. I don’t mean that other languages are free of that problem but in these cases looks like the majority of posts or articles about such language are written from this position.

Blog Restarted

Saturday, May 6th, 2017

As you (imaginary person who actually read this blog) I’ve stopped blogging last year. Mostly I did it because I ran out of material to write about. Well, now I have some new thoughts that I’d rather dump on the blog and forget so the blog restarted.

While the previous blog was mostly centred on codec reverse engineering and design and rants about random topics, this version should be more about NihAV development and codecs design. And obviously useless rants on random topics—except on opensource multimedia projects (I’ve said enough about CEmpeg, Libav and such; corporate ones I might still have something about).

Let’s see how well it goes this time…

Telegony in FLOSS

Tuesday, July 5th, 2016

In case you don’t know telegony is a belief that all offspring of a female will look like her first partner (there’s nothing wrong with it unless you know biology). In the metaphorical sense it means that the original circumstances of the company or project founding (i.e. who, with what goal and such) will affect its development to the end. Like there was a small company producing so-called “ever sharp” pencils that’s still afloat after a century—but who cares about pencils from that Sharp Corporation nowadays? Anyway, let’s talk about various opensource multimedia projects and see how it applies to them.

FFmpeg. The project was created by FFabrice Bellard who’s known for his brilliant software with horrible source code (he’s won IOCCC twice after all). FFmpeg obviously follows the suit and still has a lot of horrible code that people can’t clean up to this day. And yet it’s as ubiquitous as LZEXE packer back in DOS days or even more.

MPlayer. The project was originally created because its author could not find a video player that would support MPEG-2 video and AVI codecs at the same time—so he hacked something from libmpeg2 and avifile. And what do you know, after all these years it strives to play everything by means of ugly hacks. Those include but not limited to: catching SIGSEGV during MPEG-2 video decoding and restarting decoder, patching loaded VfW/DShow/DMO decoders based on .dll name, calling private internals of bundled FFmpeg.

MPlayer forks. mplayer2 was created by Uoti Urpala who had his disagreements with the original MPlayer team and made his own version with streamlined build system and throwing out many hacks of the original code. mpv seems to follow the suit and I guess if the next fork appears it’ll be also done by a disgruntled developer who wants to throw more hacks out and make it more average-user-friendly.

libvpx. Originally (VP3 times) it was a codec using rather well known coding methods, definitely not state of the art, with a confusing source code that was opensourced without any format description beside source code because the company didn’t care about it but wanted it to become multimedia standard anyway. The company name was On2 then, mind you! But if you look at VP8 or VP9 nothing has changed much (I still have the impression that VP9 format specification was written by engineers from that company implementing hardware decoder and benevolently edited and released by Baidu when they don’t care about VP9 any more because there’s VP X coming).

Derek B. A brilliant reverse engineer who did his first codec (VBLE) by struggling with disassembly and then asking the codec author for the sources. I’ve mentioned before that his best decoders were REd in a very original way: by declaring the intent to do that and waiting for somebody else to do it. Obviously not a project but his story fits so I thought it’s appropriate to mention him here.

VLC. As you might remember the project started to justify a high-speed LAN in some French École Abnormale (because it was not the only high school in France named École Normale in case you wonder). And what do you know—the project went fine as a business project and does so till this day. Though if they ever switch from Discworld-themed naming to Ubuntu-like they should go with Glorified Gstreamer. Why? Because while venerable old projects like XAnim, Xine or MPlayer demonstrated high-class by having their own codec reverse engineering work, VideoLAN went the way similar to the previous story—waiting until somebody does it elsewhere and then announcing support for that format.

Moral of the story: think before you start a project, people may make fun of it afterwards.

A Rant on Actimagine VX Codec

Saturday, July 2nd, 2016

Well, since I don’t do anything useful these days and just give rants on subjects nobody cares about here’s another one.

This codec (there’s also VX container with IIRC PCM audio to accompany video) can be named a Very Mobile H.264 Rip-off. Why? Because the only available binary specification I got seems to come from some game, in ARMv6 I guess and has stride hardcoded to 256 pixels which would be appropriate only for some hand-held consoles. As for the second part—it uses Elias Gamma’ codes (like H.264 does—under different name) and suspiciously similar spatial prediction. Obviously it also differs a lot because it’s intended for a low-power devices and low resolutions too.

So, it operates on 16×16 macroblocks, each one can be coded with one of 24 possible modes that are really just combinations of one of the following techniques with optional residue coding:

  • splitting into 16×8 or 8×16 sub-blocks and processing them in the same way;
  • copying data from one of the previous three frames with or without motion vector adjustment (it’s full-pixel only);
  • copying data from the previous (frame?) block with some offset added to it (actually three offsets—one per component) and motion vector optionally;
  • applying intra prediction that also comes in two flavours—four modes applied to any block size or nine modes applied for 4×4 luma blocks (still only four modes for chroma).

Residual 16×16 block is coded as 5-bit CBP (again, Elias Gamma’ code mapped to CBP value) for four 4×4 luma blocks and two 4×4 chroma blocks. Coefficients are coded like this:

  1. Mode from Xine table (it’s predicted from neighbouring blocks too) that defines how many coded coefficients are there and how many of them are ones;
  2. Signs for known ones;
  3. Elias Gamma’ codes for other coefficients and their signs;
  4. Zero run value for skips between elements (from Xine table depending on maximum coefficient level seen so far).

In other words—nothing like H.264 CAVLC mode at all.

And if you think it was fun to RE I can tell you it was not and there are still challenges to overcome. First, the specification is badly written and optimised too much that decompiler is almost worthless there (for example, refilling bits is done by jumping to the end of the function that reads unsigned Elias Gamma’ code), functions expecting certain registers to be used for the state (like block functions expect R11 and R12 to contain motion vector, bitstream reading functions operate on the context stored in R1-R3 and return result in R6 etc etc), Hex-Rays also can’t decompile anything with switch statements and block decoding functions are full of them, it often decompiles function just to the first function call and ignores the rest of function code (happened to me on x86 too once where it decided to decompile only the head of main Smacker video decompression function without the block decoding loop, that’s why I trust decompilers even less that compilers). Second, the specification seems to miss data for lookup tables used in coefficients decoding.

So if you want to have it fully REd find a better specification and/or more patient and persistent man to deal with it. As for me—it’s dead, Jim®.

On FFmpeg and Voting

Monday, June 27th, 2016

Sometimes I hear that voting in FFmpeg is worse than in North Korea. Obviously people don’t know much about voting in North Korea or they wouldn’t make such statements. Here I shall try to have a short overview on voting in two similar yet different Asian countries and compare it to FFmpeg. My knowledge about North Korea comes from posts of Russian researchers, at least one of them lives in Korea and their information looks legit to me (because it lacks political agenda and has even small details mentioned that correlate well with other information from people visiting DPRK). For the other country it’s various sources not from state media (because such information is often omitted there or you hear simply outright lies). For FFmpeg I obviously have my own experience and observations from their mailing list.

North Korea

Maybe the main reason why it makes people think about FFmpeg is the fact they claim sovereignty over the whole Korean peninsula and even appoints special people to do merges govern over regions still under occupation and even promotes them time from time. South Korea actually does the same except for useless staff but they recognize people with North Korean passports as their own citizens.

Anyway, voting. The system is rather simple: you have one candidate per region and all people vote for him. The ruling member of Kim family should get 100% support in his region (around Paektu Mountain if you forgot that)—and 100% means that all voters should come and vote (I guess it’s obvious how). In other regions some voters may skip it in case they are very ill. I don’t remember whether they can vote against but it’s rather unthinkable too. There’s a story about North Koreans fleeing to China and seeing a demonstration in South Korea on the TV—their reaction was “Why do they allow it?! How can the President run such a country?!”. So people there are quite well conditioned and voting goes smooth.


This is a country with a variety of approaches to voting—some elections nobody cares about and they can be even fair (until an inappropriate candidate is elected and then they have to correct it), some elections try to keep an image of honesty in order to show outsiders that the system works fine and elected candidates are legitimate (in that case they simply try not to rig it as blatantly), some elections are rigged in the most blatant way and even more and in the ideal case there are no more elections (under some stupid pretext they’ve got rid of governor elections and now in some towns and cities mayors are not elected either because that saves money). And referendums there while still theoretically possible can’t be on any question related to government or status of some region or the questions that are decided by government.

So depending on luck in some elections a vote can matter, in other cases it doesn’t matter and in some cases it matters that much that the votes are cast and counted without voter’s participation (some regions constantly report that over 100% of voters were present and there’s a Russian meme 146% that appeared because when they reported result for parliament elections in Rostov region IIRC the percents from different parties added up to that number and there were some other regions with total sum over 100%).

There are many tricks to get desired results—inventing new demands to filter out unwanted candidates (their neighbour Belarus has a joke “In order to be eligible as presidential candidate the person must have at least five years of presidential experience”), the same voters voting again and again (because there’s a special document allowing a voter to vote in other region—and groups of people can have a dozen of such permits per member and thus vote repeatedly in several places), putting a sheaf of “properly” filled ballots into ballot box or simply counting them as you see fit no matter what was the actual vote there.

Similar story with petitions—they are often masked with similar petitions or later an expert group finds that petition to be infeasible or contradicting the law.


In the old days voting was usually called mostly on naming issues and it worked. But then disagreement with the practices of the FFmpeg leader (like committing code without any review and despite objections) escalated to the point of The Split but before that there were ugly votings that included old MPlayer members that nobody cared about in FFmpeg (because the leader said everybody with commit rights on should have a vote). So after The Split and another ugly voting there were two projects. I don’t remember any voting in Libav but in FFmpeg this tradition still holds. Like recently there was a voting committee formed and there were at least two serious votes (not just a new season logo in trac)—for code of conduct (because every project should have one but not necessarily follow it) and for banning Carl Eugen Hoyos for some time. The first one obviously passed, the second one rather expectedly failed. But then A Person Known For Resigning As Leader took an action that should be allowed only to leaders and banned some people for 24 hours from the mailing list.

Well, I think it’s clear now that FFmpeg voting is not on North Korea level because there’s animosity there that can be expected only from current MPlayer team but not FFmpeg. But is FFmpeg on par with Russia? Maybe not yet but it tries hard IMO.

A Tale of Two Failed Projects

Saturday, June 25th, 2016

Yes, it’s about FFmpeg and Libav again. And yes, I consider them both to be failed projects (not that their basic goal is failed and they provide even less multimedia support than GStreamer with no external libraries used), I mean the state of the project as living and developing entity.

Even if I mostly emulate Derek nowadays—i.e. unsubscribed from FFmpeg and Libav mailing lists, do nothing productive, wait for somebody to reverse engineer codecs I care about somewhat (that would be ClearVideo, thank you very much). Yet I peruse development-related resources for both projects (mostly for finding laughs) and sometimes I see gems like this (it was pointed at in the comments as well since it answers some questions I’ve asked before).

First, I’d like to outline how large projects are organised and what to expect in general. So, if you have a large and used project you’ll have at least these components:

  • codebase (normal projects have some code to run after all);
  • developers (to add features, fix bugs and such);
  • users (to annoy developers and once in a while to provide sensible bugreport or feature request);
  • infrastructure (hosting for code, means to communicate for developers, maybe even support for users).

Developers can be also divided into three main categories:

  1. core developers—the ones who do main work on the codebase and do it in regular manner (they might intersect with the next category too);
  2. corporate developers—the ones who do work mostly on behalf of their companies (e.g. add a feature they need internally so they don’t have to maintain it themselves);
  3. contributors—developers who add some feature or provide some bugfix because they needed it themselves, they do it irregularly or even just once (again, they might intersect with the previous category).

This division is by no means perfect but it shows the main forces behind development: those who treat is as a hobby, those who do it for their benefit (i.e. making money with/from it) and those who use it and just want to be it a bit more suited to their personal needs.

So, with that all in mind let’s look at the projects:


Codebase. It’s a complete mess. And its git history is even worse. The running joke is that who cares what that piece of code does, it’s FFeature so it must be kept at whatever cost (that’s how you get double decoders, demuxers and encoders; an outstanding example there would be libutvideo wrappers—refer for the details to ffmpeg-devel mailing list).

Developers. Because of the merging policy (that is likely to be codified soon—see this document again) many developers of FFmpeg code are not FFmpeg developers. And yet they are dictating API to be used in FFmpeg: the first example that also involves me—I’ve proposed side data for packets in Libav, FFmpeg hesitated for a bit yet included it with such flattering message; the most of examples include Anton’s work from introducing refcounted buffers to splitting codec parameters into separate structure—in any case FFmpeg simply takes it and converts their code to comply with a new practice (even if it has to include some horrible hacks). If that doesn’t cry out loud “a failed project” I don’t know what does.

Also (even if I’m stepping onto minefield) some FFmpeg developers are completely unfit for collective work because of their personal qualities. People may make jokes about providing full console output of ffmpeg command but it’s not Carl who’s the main problem in FFmpeg (yes, people who didn’t work on MPlayer might think otherwise; I still believe he’d be a decent leader for FFmpeg—mostly because he doesn’t focus just on technical side and he’s unlikely to be treated as a technical god who can’t make any mistake or write less than perfect code). Here it’s more about Michael and Clément—the former never really understood what being a leader really is or what resigning from a leader means (anyone disagreeing please ban yourself from a mailing list of your choice for 24 hours), the latter does not understand people at all (neither does Michael)—I’m not going to paste the link to the same document for the third time, I’ll simply quote the relevant part:

Any Libav developer is of course welcome anytime to contribute directly to the
FFmpeg tree. Of course, we fully understand and are forced to accept that very
few Libav developers are interested in doing so, but we still want to recognize
their work.

Here’s an excerpt from Michael’s mail:

> Don’t you think you should remove Diego, Måns, Kostya, … as well?

They didnt ask me to remove them, they didnt remove themselfs even
though they could, they didnt post a patch to remove themselfs.
No contributor said that he contacted them and they no longer maintain
the code they are listed for. (or i missed that)

Well, if it’s hard to realize that Libav developers don’t want to contribute to FFmpeg and don’t want to do anything with it even though it’s been over five years then you really have a problem. And I’ve expressed my thought on reuniting both projects already.

Users. You know, there’s a difference between catering to your users and selling out completely (to put it mildly). When you see some changes being done in interests of some third party often without mentioning it that looks suspicious. I’m not against making money off your work but when it’s not even mentioning the fact it looks strange; when you have a decoder with a copyright assigned to some company it’s fine, but when you have fixes for files nobody has seen or FFv1 features added because it was all paid by somebody (see here slide 12) it looks not completely honest even if there’s nothing wrong with it.

Infrastructure. From what I understood FFmpeg services are now hosted on various boxes with no plan or idea (i.e. if somebody could provide a box for something they took it) and there’s no system administrator for these boxes. Again, as I understand it, they were kicked out of Hungary for some reason and even though they got a free server and hosting in Bulgaria they cannot use that box properly because there’s nobody to set it up properly and maintain afterwards. Sounds like fail to me.


This project is failed for the different reasons but failed nevertheless.

Codebase. While it’s mostly fine sadly new features hardly come in. Just two examples—there have been talks about replacing libswscale since ages, two years ago they’d started to design it (and it went nowhere), then I offered my design with a PoC (yes, piece of that) code to test it (that’s how NAScale was born), people work on integrating it into Libav a bit and that’s all—nothing has happened yet; the second example is bitstream reader replacement—since its submission in April nothing has come out of it as all traction was lost in bikeshedding. Is it failure or what?

Developers. Here we have two problems—some FFmpeg folks and some core developers. I’ve written about the former before so let’s talk about the latter. Surprisingly or not there are counterparts for Austrian FFmpeg developers in Libav. Where in FFmpeg you have Carl Eugen, in Libav there’s Diego and I guess many have suffered from his perfectionism (in form of proper formatting). And instead of Michael there’s Anton. While he is not that leadery in general sense, he’s the one introducing big changes in API that are hardly discussed before. And even worse thing—he tries to make all nontrivial code go through him, QSV support is a good example: Maxym Dmytrychenko had submitted initial support but it was not deemed good enough so Luca Barbato had to rework it into proper form. And what do you know? It turned out to be not good enough for Anton so he worked on it himself with the result not being much different from Luca’s. And since nothing is being done about that I consider it to be a failure.

Users. Sadly, there seems to exist not so many of them which is a fail. On the other hoof they don’t need to deal with distros and Baidu and that’s a blessing by itself. Though there is still an issue with FFmpeg users who bother (ex-)developers for features present in FFmpeg but not in Libav (or present in different form), like Blackmagic card support or prores_ks encoder (hint: there’s no encoder with such name in Libav and it’s my personal pleasure to ignore mails about it).

Infrastructure. From what I heard thanks to Attila and Janne everything is working fine.

Well, maybe I should continue with Actimagine VX codec at last and forget about multimedia outside work matters afterwards (insert the obvious joke about this not hurting NihAV development at all).