Telegony in FLOSS

July 5th, 2016

In case you don’t know telegony is a belief that all offspring of a female will look like her first partner (there’s nothing wrong with it unless you know biology). In the metaphorical sense it means that the original circumstances of the company or project founding (i.e. who, with what goal and such) will affect its development to the end. Like there was a small company producing so-called “ever sharp” pencils that’s still afloat after a century—but who cares about pencils from that Sharp Corporation nowadays? Anyway, let’s talk about various opensource multimedia projects and see how it applies to them.

FFmpeg. The project was created by FFabrice Bellard who’s known for his brilliant software with horrible source code (he’s won IOCCC twice after all). FFmpeg obviously follows the suit and still has a lot of horrible code that people can’t clean up to this day. And yet it’s as ubiquitous as LZEXE packer back in DOS days or even more.

MPlayer. The project was originally created because its author could not find a video player that would support MPEG-2 video and AVI codecs at the same time—so he hacked something from libmpeg2 and avifile. And what do you know, after all these years it strives to play everything by means of ugly hacks. Those include but not limited to: catching SIGSEGV during MPEG-2 video decoding and restarting decoder, patching loaded VfW/DShow/DMO decoders based on .dll name, calling private internals of bundled FFmpeg.

MPlayer forks. mplayer2 was created by Uoti Urpala who had his disagreements with the original MPlayer team and made his own version with streamlined build system and throwing out many hacks of the original code. mpv seems to follow the suit and I guess if the next fork appears it’ll be also done by a disgruntled developer who wants to throw more hacks out and make it more average-user-friendly.

libvpx. Originally (VP3 times) it was a codec using rather well known coding methods, definitely not state of the art, with a confusing source code that was opensourced without any format description beside source code because the company didn’t care about it but wanted it to become multimedia standard anyway. The company name was On2 then, mind you! But if you look at VP8 or VP9 nothing has changed much (I still have the impression that VP9 format specification was written by engineers from that company implementing hardware decoder and benevolently edited and released by Baidu when they don’t care about VP9 any more because there’s VP X coming).

Derek B. A brilliant reverse engineer who did his first codec (VBLE) by struggling with disassembly and then asking the codec author for the sources. I’ve mentioned before that his best decoders were REd in a very original way: by declaring the intent to do that and waiting for somebody else to do it. Obviously not a project but his story fits so I thought it’s appropriate to mention him here.

VLC. As you might remember the project started to justify a high-speed LAN in some French École Abnormale (because it was not the only high school in France named École Normale in case you wonder). And what do you know—the project went fine as a business project and does so till this day. Though if they ever switch from Discworld-themed naming to Ubuntu-like they should go with Glorified Gstreamer. Why? Because while venerable old projects like XAnim, Xine or MPlayer demonstrated high-class by having their own codec reverse engineering work, VideoLAN went the way similar to the previous story—waiting until somebody does it elsewhere and then announcing support for that format.

Moral of the story: think before you start a project, people may make fun of it afterwards.

A Rant on Actimagine VX Codec

July 2nd, 2016

Well, since I don’t do anything useful these days and just give rants on subjects nobody cares about here’s another one.

This codec (there’s also VX container with IIRC PCM audio to accompany video) can be named a Very Mobile H.264 Rip-off. Why? Because the only available binary specification I got seems to come from some game, in ARMv6 I guess and has stride hardcoded to 256 pixels which would be appropriate only for some hand-held consoles. As for the second part—it uses Elias Gamma’ codes (like H.264 does—under different name) and suspiciously similar spatial prediction. Obviously it also differs a lot because it’s intended for a low-power devices and low resolutions too.

So, it operates on 16×16 macroblocks, each one can be coded with one of 24 possible modes that are really just combinations of one of the following techniques with optional residue coding:

  • splitting into 16×8 or 8×16 sub-blocks and processing them in the same way;
  • copying data from one of the previous three frames with or without motion vector adjustment (it’s full-pixel only);
  • copying data from the previous (frame?) block with some offset added to it (actually three offsets—one per component) and motion vector optionally;
  • applying intra prediction that also comes in two flavours—four modes applied to any block size or nine modes applied for 4×4 luma blocks (still only four modes for chroma).

Residual 16×16 block is coded as 5-bit CBP (again, Elias Gamma’ code mapped to CBP value) for four 4×4 luma blocks and two 4×4 chroma blocks. Coefficients are coded like this:

  1. Mode from Xine table (it’s predicted from neighbouring blocks too) that defines how many coded coefficients are there and how many of them are ones;
  2. Signs for known ones;
  3. Elias Gamma’ codes for other coefficients and their signs;
  4. Zero run value for skips between elements (from Xine table depending on maximum coefficient level seen so far).

In other words—nothing like H.264 CAVLC mode at all.

And if you think it was fun to RE I can tell you it was not and there are still challenges to overcome. First, the specification is badly written and optimised too much that decompiler is almost worthless there (for example, refilling bits is done by jumping to the end of the function that reads unsigned Elias Gamma’ code), functions expecting certain registers to be used for the state (like block functions expect R11 and R12 to contain motion vector, bitstream reading functions operate on the context stored in R1-R3 and return result in R6 etc etc), Hex-Rays also can’t decompile anything with switch statements and block decoding functions are full of them, it often decompiles function just to the first function call and ignores the rest of function code (happened to me on x86 too once where it decided to decompile only the head of main Smacker video decompression function without the block decoding loop, that’s why I trust decompilers even less that compilers). Second, the specification seems to miss data for lookup tables used in coefficients decoding.

So if you want to have it fully REd find a better specification and/or more patient and persistent man to deal with it. As for me—it’s dead, Jim®.

On FFmpeg and Voting

June 27th, 2016

Sometimes I hear that voting in FFmpeg is worse than in North Korea. Obviously people don’t know much about voting in North Korea or they wouldn’t make such statements. Here I shall try to have a short overview on voting in two similar yet different Asian countries and compare it to FFmpeg. My knowledge about North Korea comes from posts of Russian researchers, at least one of them lives in Korea and their information looks legit to me (because it lacks political agenda and has even small details mentioned that correlate well with other information from people visiting DPRK). For the other country it’s various sources not from state media (because such information is often omitted there or you hear simply outright lies). For FFmpeg I obviously have my own experience and observations from their mailing list.

North Korea

Maybe the main reason why it makes people think about FFmpeg is the fact they claim sovereignty over the whole Korean peninsula and even appoints special people to do merges govern over regions still under occupation and even promotes them time from time. South Korea actually does the same except for useless staff but they recognize people with North Korean passports as their own citizens.

Anyway, voting. The system is rather simple: you have one candidate per region and all people vote for him. The ruling member of Kim family should get 100% support in his region (around Paektu Mountain if you forgot that)—and 100% means that all voters should come and vote (I guess it’s obvious how). In other regions some voters may skip it in case they are very ill. I don’t remember whether they can vote against but it’s rather unthinkable too. There’s a story about North Koreans fleeing to China and seeing a demonstration in South Korea on the TV—their reaction was “Why do they allow it?! How can the President run such a country?!”. So people there are quite well conditioned and voting goes smooth.


This is a country with a variety of approaches to voting—some elections nobody cares about and they can be even fair (until an inappropriate candidate is elected and then they have to correct it), some elections try to keep an image of honesty in order to show outsiders that the system works fine and elected candidates are legitimate (in that case they simply try not to rig it as blatantly), some elections are rigged in the most blatant way and even more and in the ideal case there are no more elections (under some stupid pretext they’ve got rid of governor elections and now in some towns and cities mayors are not elected either because that saves money). And referendums there while still theoretically possible can’t be on any question related to government or status of some region or the questions that are decided by government.

So depending on luck in some elections a vote can matter, in other cases it doesn’t matter and in some cases it matters that much that the votes are cast and counted without voter’s participation (some regions constantly report that over 100% of voters were present and there’s a Russian meme 146% that appeared because when they reported result for parliament elections in Rostov region IIRC the percents from different parties added up to that number and there were some other regions with total sum over 100%).

There are many tricks to get desired results—inventing new demands to filter out unwanted candidates (their neighbour Belarus has a joke “In order to be eligible as presidential candidate the person must have at least five years of presidential experience”), the same voters voting again and again (because there’s a special document allowing a voter to vote in other region—and groups of people can have a dozen of such permits per member and thus vote repeatedly in several places), putting a sheaf of “properly” filled ballots into ballot box or simply counting them as you see fit no matter what was the actual vote there.

Similar story with petitions—they are often masked with similar petitions or later an expert group finds that petition to be infeasible or contradicting the law.


In the old days voting was usually called mostly on naming issues and it worked. But then disagreement with the practices of the FFmpeg leader (like committing code without any review and despite objections) escalated to the point of The Split but before that there were ugly votings that included old MPlayer members that nobody cared about in FFmpeg (because the leader said everybody with commit rights on should have a vote). So after The Split and another ugly voting there were two projects. I don’t remember any voting in Libav but in FFmpeg this tradition still holds. Like recently there was a voting committee formed and there were at least two serious votes (not just a new season logo in trac)—for code of conduct (because every project should have one but not necessarily follow it) and for banning Carl Eugen Hoyos for some time. The first one obviously passed, the second one rather expectedly failed. But then A Person Known For Resigning As Leader took an action that should be allowed only to leaders and banned some people for 24 hours from the mailing list.

Well, I think it’s clear now that FFmpeg voting is not on North Korea level because there’s animosity there that can be expected only from current MPlayer team but not FFmpeg. But is FFmpeg on par with Russia? Maybe not yet but it tries hard IMO.

A Tale of Two Failed Projects

June 25th, 2016

Yes, it’s about FFmpeg and Libav again. And yes, I consider them both to be failed projects (not that their basic goal is failed and they provide even less multimedia support than GStreamer with no external libraries used), I mean the state of the project as living and developing entity.

Even if I mostly emulate Derek nowadays—i.e. unsubscribed from FFmpeg and Libav mailing lists, do nothing productive, wait for somebody to reverse engineer codecs I care about somewhat (that would be ClearVideo, thank you very much). Yet I peruse development-related resources for both projects (mostly for finding laughs) and sometimes I see gems like this (it was pointed at in the comments as well since it answers some questions I’ve asked before).

First, I’d like to outline how large projects are organised and what to expect in general. So, if you have a large and used project you’ll have at least these components:

  • codebase (normal projects have some code to run after all);
  • developers (to add features, fix bugs and such);
  • users (to annoy developers and once in a while to provide sensible bugreport or feature request);
  • infrastructure (hosting for code, means to communicate for developers, maybe even support for users).

Developers can be also divided into three main categories:

  1. core developers—the ones who do main work on the codebase and do it in regular manner (they might intersect with the next category too);
  2. corporate developers—the ones who do work mostly on behalf of their companies (e.g. add a feature they need internally so they don’t have to maintain it themselves);
  3. contributors—developers who add some feature or provide some bugfix because they needed it themselves, they do it irregularly or even just once (again, they might intersect with the previous category).

This division is by no means perfect but it shows the main forces behind development: those who treat is as a hobby, those who do it for their benefit (i.e. making money with/from it) and those who use it and just want to be it a bit more suited to their personal needs.

So, with that all in mind let’s look at the projects:


Codebase. It’s a complete mess. And its git history is even worse. The running joke is that who cares what that piece of code does, it’s FFeature so it must be kept at whatever cost (that’s how you get double decoders, demuxers and encoders; an outstanding example there would be libutvideo wrappers—refer for the details to ffmpeg-devel mailing list).

Developers. Because of the merging policy (that is likely to be codified soon—see this document again) many developers of FFmpeg code are not FFmpeg developers. And yet they are dictating API to be used in FFmpeg: the first example that also involves me—I’ve proposed side data for packets in Libav, FFmpeg hesitated for a bit yet included it with such flattering message; the most of examples include Anton’s work from introducing refcounted buffers to splitting codec parameters into separate structure—in any case FFmpeg simply takes it and converts their code to comply with a new practice (even if it has to include some horrible hacks). If that doesn’t cry out loud “a failed project” I don’t know what does.

Also (even if I’m stepping onto minefield) some FFmpeg developers are completely unfit for collective work because of their personal qualities. People may make jokes about providing full console output of ffmpeg command but it’s not Carl who’s the main problem in FFmpeg (yes, people who didn’t work on MPlayer might think otherwise; I still believe he’d be a decent leader for FFmpeg—mostly because he doesn’t focus just on technical side and he’s unlikely to be treated as a technical god who can’t make any mistake or write less than perfect code). Here it’s more about Michael and Clément—the former never really understood what being a leader really is or what resigning from a leader means (anyone disagreeing please ban yourself from a mailing list of your choice for 24 hours), the latter does not understand people at all (neither does Michael)—I’m not going to paste the link to the same document for the third time, I’ll simply quote the relevant part:

Any Libav developer is of course welcome anytime to contribute directly to the
FFmpeg tree. Of course, we fully understand and are forced to accept that very
few Libav developers are interested in doing so, but we still want to recognize
their work.

Here’s an excerpt from Michael’s mail:

> Don’t you think you should remove Diego, Måns, Kostya, … as well?

They didnt ask me to remove them, they didnt remove themselfs even
though they could, they didnt post a patch to remove themselfs.
No contributor said that he contacted them and they no longer maintain
the code they are listed for. (or i missed that)

Well, if it’s hard to realize that Libav developers don’t want to contribute to FFmpeg and don’t want to do anything with it even though it’s been over five years then you really have a problem. And I’ve expressed my thought on reuniting both projects already.

Users. You know, there’s a difference between catering to your users and selling out completely (to put it mildly). When you see some changes being done in interests of some third party often without mentioning it that looks suspicious. I’m not against making money off your work but when it’s not even mentioning the fact it looks strange; when you have a decoder with a copyright assigned to some company it’s fine, but when you have fixes for files nobody has seen or FFv1 features added because it was all paid by somebody (see here slide 12) it looks not completely honest even if there’s nothing wrong with it.

Infrastructure. From what I understood FFmpeg services are now hosted on various boxes with no plan or idea (i.e. if somebody could provide a box for something they took it) and there’s no system administrator for these boxes. Again, as I understand it, they were kicked out of Hungary for some reason and even though they got a free server and hosting in Bulgaria they cannot use that box properly because there’s nobody to set it up properly and maintain afterwards. Sounds like fail to me.


This project is failed for the different reasons but failed nevertheless.

Codebase. While it’s mostly fine sadly new features hardly come in. Just two examples—there have been talks about replacing libswscale since ages, two years ago they’d started to design it (and it went nowhere), then I offered my design with a PoC (yes, piece of that) code to test it (that’s how NAScale was born), people work on integrating it into Libav a bit and that’s all—nothing has happened yet; the second example is bitstream reader replacement—since its submission in April nothing has come out of it as all traction was lost in bikeshedding. Is it failure or what?

Developers. Here we have two problems—some FFmpeg folks and some core developers. I’ve written about the former before so let’s talk about the latter. Surprisingly or not there are counterparts for Austrian FFmpeg developers in Libav. Where in FFmpeg you have Carl Eugen, in Libav there’s Diego and I guess many have suffered from his perfectionism (in form of proper formatting). And instead of Michael there’s Anton. While he is not that leadery in general sense, he’s the one introducing big changes in API that are hardly discussed before. And even worse thing—he tries to make all nontrivial code go through him, QSV support is a good example: Maxym Dmytrychenko had submitted initial support but it was not deemed good enough so Luca Barbato had to rework it into proper form. And what do you know? It turned out to be not good enough for Anton so he worked on it himself with the result not being much different from Luca’s. And since nothing is being done about that I consider it to be a failure.

Users. Sadly, there seems to exist not so many of them which is a fail. On the other hoof they don’t need to deal with distros and Baidu and that’s a blessing by itself. Though there is still an issue with FFmpeg users who bother (ex-)developers for features present in FFmpeg but not in Libav (or present in different form), like Blackmagic card support or prores_ks encoder (hint: there’s no encoder with such name in Libav and it’s my personal pleasure to ignore mails about it).

Infrastructure. From what I heard thanks to Attila and Janne everything is working fine.

Well, maybe I should continue with Actimagine VX codec at last and forget about multimedia outside work matters afterwards (insert the obvious joke about this not hurting NihAV development at all).

A Quick Look on Perseus

June 21st, 2016

So, unlike those breakthrough codecs everybody talks about (I mean RMHD and ORBX.js), V-Nova Perseus was delivered (but what do you expect from a codec announced on the first of April?) and is available in some Android app. So I’ve looked at it.

The implementation seems bafflingly simple: there’s a base layer, it gets upscaled 2x and an enhancement is applied to the upscaled image. And those enhancements are essentially quantised differences after 2×2 Haar transform plus runs, all coded with context-dependent Huffman codes. If that reminds you of RealVideo—don’t worry, they code those codebook descriptions too so it’s different.

I don’t know if it really works as good as promisedmarketed but it’s an interesting approach and it introduces some variety in the world of codecs that look alike—mostly because they all use the same principles as the standard video codec with some small enhancements or building blocks replaced with functional analogues; yes, I completely forgot about Daala, please remind me about it when they settle with final design—it might be the codec of choice for GNU HURD NG by then too.

On H.264 Coding Schemes Names

June 3rd, 2016

Continuing the theme set by the previous post, let’s talk more about confusing names introduced by H.264. I mean CAVLC and CABAC.

CAVLC stands for Context-based Adaptive Variable Length Coding. While technically true because it employs variable-length codes and the code set is selected based on context it’s nothing special (and I’ve not spotted anything there that would make it “adaptive”). Again, it’s a trivial thing less exercised before because they had less ROM for codebooks. The idea of “let’s select codebook depending on top and/or left decoded values” it too trivial to get an own name IMO.

CABAC stands for Context-based Adaptive Binary Arithmetic Coding and the name is partly stupid and partly misleading. But before I explain why I want to present some history and terminology.

Arithmetic coding was developed in late sixties to early seventies but mostly known by work of Rissanen and Langdon that resulted in many IBM patents. The idea is that you can assign probabilities to various symbols, send them to the coder and the coding result is a long fraction belonging to the range obtained by multiplying the ranges in sequence. I.e. if we have probabilities for A, B and C as ranges [0; 1/3), [1/3; 2/3) and [2/3; 1) then AB is coded in [1/9; 2/9) range and BA is coded in [1/3; 4/9) range. It’s the ideal coding method since it codes probabilities in the minimum possible amount of bits (unless you remember it’s real world and we don’t have infinite-precision arithmetic; still, the losses are very small and there’s no better coding method).

And in 1979 G.*.*. Martin (no, not the writer known for Tuf Voyaging) introduced range coding. Which is absolutely the same thing except that (de)coder maintains low and range values instead of low and high values in a conventional arithmetic coder (hence the name). Since it was kinda not covered by patents it got more popularity over the years.

And because dealing with arbitrary probabilities usually involves division by an arbitrary integer (and maybe increased coder precision) the further improvements were for sacrificing efficiency for speed until it boiled down to coding just two symbols and creating more elaborate models to code input that takes more than one bit. Arithmetic mode in JPEG seems to be simply feeding bits from Huffman codes and such to Q-coder (patented by IBM and thus extremely popular in the wild) that squeezes a bit more entropy out of them. Then you create an advanced version (MQ-coder) and push it into JPEG-2000 until binary coding is popular in image and video coding.

So, CABAC is:

  1. Context-based — yes, static coding would be a tad more effective than Huffman coding applied to bits (hint: it gives no savings). The problem that it’s the first step of the classical scheme: modelling and providing probability to the entropy coder;
  2. Adaptive — see above;
  3. Binary — true (just remember it codes not bits but most and least probable symbols);
  4. Arithmetic — actually it uses range coding;
  5. Coding — nothing to argue with here.

In general, the naming is a lot like USSR which was hardly a union, probably not soviet (whatever that word means—literal meaning is “belonging to the councils”), republics were just provinces (or local despoties) but it was more or less socialistic (to the point its ideology can be called international socialism and it was founded by SDAPR(B) too).

And I’d like to point out that CABAC should refer to the whole process of binarisation+context selection plus coding the result, not just the exact implementation used in ITU H.264 and ITU H.EVC (even if it’s called CBAC in AVS and “we have completely different coding” in VPx). And if you want an example of context-based adaptive binary non-arithmetic coding look at ELS used in G2M2 (and if you drop binary coding then you have examples in every other advanced lossless audio codec).

On Variable-Length Codes Names

May 31st, 2016

After seeing the recent commit in Libav this rant simply wrote itself.

People, Solomon Wolf Golomb was a genius whose work influenced various areas of science (I’ve read about his work in Martin Gardner’s books plus some of his papers) but please stop attributing to him stuff he did not invent. I’m talking about universal variable-length codes for integers (or Xine for short).

He has introduced (in late sixties) a specific kind of Xine for optimal coding of integers with certain distributions (I’ve recommended to read the paper introducing them before, it’s awesome and I wish more papers were written like that one). Those codes have a parameter k that is distribution parameter and also it’s used to split code into two parts—an unary prefix coding N/k and log2k bits coding N%k (for some values it’s rounded down, for another it’s rounded up). Later Robert Rice had a similar idea and independently introduced codes that were Golomb codes with parameter 2^k (and thus they’re often called Rice codes and they’re used more because they are easier to manipulate on computer). And that’s all—there are no other Golomb codes.

Yet thanks to ITU H.264 standard (aka GNUMPEG-4 AVC) we have exp-Golomb codes and interleaved exp-Golomb codes. I don’t know who decided on the name but it’s misleading and wrong (but because it’s in the standard people insist on using it; that also shows how much people designing codecs know about general compression methods). Maybe if some other Xines were rediscovered they’d go under equally ridiculous names like geo(metric)-Golomb or norm(al)-Golomb or quad-Golomb or recursive Golomb codes (because people have never heard of Levenstein coding).

Again, back in seventies Peter Elias proposed a scheme for coding: let’s call unary code (i.e. the one that codes an integer as a series on one value terminated with the other value like 000001 or 1111110) alpha code and fixed bit representation of an integer beta code then we can arrive to gamma code that combines both.

So “exp-Golomb” code is really Elias gamma code? NOPE! Like with TV interlacing came first and actual Elias gamma code is what is incorrectly called “interleaved exp-Golomb” (i.e. first you have flag bit telling whether the code is over or there are more bits left, then data bit, then flag bit again, rinse, repeat). And progressive version is Elias gamma prime has unary prefix i.e. alpha code for the length of the second part concatenated with beta code for that part (I’ve rechecked the original paper freely available at sci-hub—because the only time I paid for IEEE papers access was when my scientific supervisor sent me with money to pay for IEEE membership renewal for our chair). Then you can construct delta code that codes integer value in three parts (actual bits, their length and length of the length part) and jump to omega codes that code mostly lengths of the following length part (very meta!).

So there’s another thing to dislike in H.264 standard beside all interlaced modes and scalable and multi-view coding, it’s forcing a wrong terminology on the world (feel free to correct me if there were earlier uses). The same way there are confusions between arithmetic and range coding, various binary coders are not free from it too. But that’s a rant for another day.

On Italian Literature

May 28th, 2016

One cannot be called a true reverse engineer unless he tried (and failed) REing Italian literature collection. I’ve finally tried it (and, obviously, failed).

What’s so special about it? Here is Mike’s description. From what I’ve seen on the first CD videos occupy 280MB out of total maybe 300MB (and over 200MB of it is a single tutorial video). While the actual library data occupies about five megabytes there.

The main library application is written in Visual Basic 4 (16-bit version) and it’s not a compiled version but rather P-code and I’ve failed to find a decompiler for that exact version (32-bit? seems no problem; 16-bit or even 32-bit Visual Basic 3? also no problem; 16-bit VB3? keep searching). There are some utility apps of unknown purpose there written in Borland Delphi (also 16-bit and I’m pretty sure it was simply Borland Delphi then, no additional versions needed). And while those are in sane machine code (well, 16-bit x86 machine code is hardly sane but manageable) there’s a lot of Delphi cruft compiled in with TThis and TThat and TOtherThing and such (plus additions in Italian).

Despite files having extension .LZ[1-3] I doubt they employ any kind of Lempel-Ziv compression, I’d expect some different dictionary-based scheme (you have an index with all possible words after all). And looks like they’ve licensed some DBT thing (obviously it stands for Text DataBase in Italian) from some Italian Institute of Computational Linguistics and this DBT is responsible for the file formats but I’m too lazy to RE those half-megabyte .dlls without a decompiler (written in Delphi too).

A Quick Look on DLI

May 26th, 2016

So yesterday I had a quick look on DLI image format. It turns out to be somewhat related to video codecs (and JPEG of course): there’s 8×8 fast integer DCT approximation, quantisation and bit coding of the block. And bit coding is the most interesting part really—this format employs binary model with old-school arithmetic coding and context selection for the model; coefficients are coded as first an array of coded coefficients flags (plus a flag for last coded coefficient), each non-zero coefficient has an additional flag to signal whether it’s larger than one and in that case it’s coded as unary code (bits coded with arithmetic coder of course).

And I still don’t like the notion of “let’s make our video codec I-frames into an image codec” (the reverse of “motion <image format>” is not much better but at least it makes sense for intermediate formats). Images and video codecs have different use cases and required features but I think I ranted about it once.

When Old Will Beat New Again?

May 19th, 2016

Since my previous post hasn’t brought me answers I sought here’s another philosophical (i.e. no answers again) post on question that bothers me.

The concept is rather simple: some old tricks and methods become more appealing over the time when other more competitive methods lose their traction. So I often wonder when those old methods, approaches and tricks will become relevant again.

For instance, quadtree coding was not popular some time ago and yet we see it again in codecs where it handles blocks of smaller sizes inside some coding unit (ITU H.EVC, VP9, AOMv1—you name the codec). There’s similar story with vector quantisation—it still lives in some GPU-assisted form and is interesting again.

Now let’s talk about classical arithmetic coding. In the flow of time it was mostly supplanted by some variation of binary coding. But with the time binary coding becomes more and more unwieldy since you have to code bits with different contexts and you often don’t code bits per se but rather bits for some variable-length code for integers. So I wonder if the classical arithmetic coding may come to use again and return saner coding while still being faster? Of course one could point me to One Xiphophorus, the company that made the best VP3 encoder, since they’ve found this approach worked fine in Celt and should work fine in Daala (unrelated to them: is FFA1 still a thing?). But really, is CABAC/boolean coder still the coder(s) of future or we’ll see more interesting things from the past? And yes, I’m aware how rANS can be used for faster coding of probabilities and that ANS is used in VP10 experiments. But what about, say, better modelling with, say, order-10 contexts (or the ones that take parameters from both neighbour blocks and blocks upper in hierarchy into account)?

And another one is not related to my usual stuff but is still quite interesting. Will raytracing return again? From what I know the current way is to have lots of triangles, lots of textures, lots of crazy additional maps and lots of even crazier shaders. I believe it went this way:

  • let’s approximate everything by triangles and draw them;
  • simple colours are not good enough—let’s add textures;
  • not good enough—let’s add shading (like Gouraud or Phong);
  • not good enough—let’s introduce bump maps for better realism;
  • not good enough—let’s introduce light maps;
  • not good enough—let’s introduce computable shaders;
  • still not good enough—let’s render scene once, calculate different parameters from it, create new light/shadow/whatever maps, add them to the scene and rerender again;
  • you know what, it’s still not good enough—let’s …

(I don’t know much about computer graphics since our university course didn’t went much farther than Bresenham’s line algorithms and simple image formats)

With all this trickery you still have not achieved realistic picture especially when it comes to dynamic light, shadows and reflections. Yet during all this time there was raytracing which is simple as hell (and equally slow): you have a scene and for each pixel you simply trace its path until you end in some light source or simply give up. With massive parallelism of GPUs and complex shaders it looks to me that switching to raytracing might be easier (sure, there’s a problem of legacy, making all those developers switch from Magma and Vulcan to a new approach etc etc) but I still wonder if it makes sense from technical point of view or will make in the near future.

And as usual—I hope for the answers but I don’t expect that I receive any.