Archive for May, 2016

On Variable-Length Codes Names

Tuesday, May 31st, 2016

After seeing the recent commit in Libav this rant simply wrote itself.

People, Solomon Wolf Golomb was a genius whose work influenced various areas of science (I’ve read about his work in Martin Gardner’s books plus some of his papers) but please stop attributing to him stuff he did not invent. I’m talking about universal variable-length codes for integers (or Xine for short).

He has introduced (in late sixties) a specific kind of Xine for optimal coding of integers with certain distributions (I’ve recommended to read the paper introducing them before, it’s awesome and I wish more papers were written like that one). Those codes have a parameter k that is distribution parameter and also it’s used to split code into two parts—an unary prefix coding N/k and log2k bits coding N%k (for some values it’s rounded down, for another it’s rounded up). Later Robert Rice had a similar idea and independently introduced codes that were Golomb codes with parameter 2^k (and thus they’re often called Rice codes and they’re used more because they are easier to manipulate on computer). And that’s all—there are no other Golomb codes.

Yet thanks to ITU H.264 standard (aka GNUMPEG-4 AVC) we have exp-Golomb codes and interleaved exp-Golomb codes. I don’t know who decided on the name but it’s misleading and wrong (but because it’s in the standard people insist on using it; that also shows how much people designing codecs know about general compression methods). Maybe if some other Xines were rediscovered they’d go under equally ridiculous names like geo(metric)-Golomb or norm(al)-Golomb or quad-Golomb or recursive Golomb codes (because people have never heard of Levenstein coding).

Again, back in seventies Peter Elias proposed a scheme for coding: let’s call unary code (i.e. the one that codes an integer as a series on one value terminated with the other value like 000001 or 1111110) alpha code and fixed bit representation of an integer beta code then we can arrive to gamma code that combines both.

So “exp-Golomb” code is really Elias gamma code? NOPE! Like with TV interlacing came first and actual Elias gamma code is what is incorrectly called “interleaved exp-Golomb” (i.e. first you have flag bit telling whether the code is over or there are more bits left, then data bit, then flag bit again, rinse, repeat). And progressive version is Elias gamma prime has unary prefix i.e. alpha code for the length of the second part concatenated with beta code for that part (I’ve rechecked the original paper freely available at sci-hub—because the only time I paid for IEEE papers access was when my scientific supervisor sent me with money to pay for IEEE membership renewal for our chair). Then you can construct delta code that codes integer value in three parts (actual bits, their length and length of the length part) and jump to omega codes that code mostly lengths of the following length part (very meta!).

So there’s another thing to dislike in H.264 standard beside all interlaced modes and scalable and multi-view coding, it’s forcing a wrong terminology on the world (feel free to correct me if there were earlier uses). The same way there are confusions between arithmetic and range coding, various binary coders are not free from it too. But that’s a rant for another day.

On Italian Literature

Saturday, May 28th, 2016

One cannot be called a true reverse engineer unless he tried (and failed) REing Italian literature collection. I’ve finally tried it (and, obviously, failed).

What’s so special about it? Here is Mike’s description. From what I’ve seen on the first CD videos occupy 280MB out of total maybe 300MB (and over 200MB of it is a single tutorial video). While the actual library data occupies about five megabytes there.

The main library application is written in Visual Basic 4 (16-bit version) and it’s not a compiled version but rather P-code and I’ve failed to find a decompiler for that exact version (32-bit? seems no problem; 16-bit or even 32-bit Visual Basic 3? also no problem; 16-bit VB3? keep searching). There are some utility apps of unknown purpose there written in Borland Delphi (also 16-bit and I’m pretty sure it was simply Borland Delphi then, no additional versions needed). And while those are in sane machine code (well, 16-bit x86 machine code is hardly sane but manageable) there’s a lot of Delphi cruft compiled in with TThis and TThat and TOtherThing and such (plus additions in Italian).

Despite files having extension .LZ[1-3] I doubt they employ any kind of Lempel-Ziv compression, I’d expect some different dictionary-based scheme (you have an index with all possible words after all). And looks like they’ve licensed some DBT thing (obviously it stands for Text DataBase in Italian) from some Italian Institute of Computational Linguistics and this DBT is responsible for the file formats but I’m too lazy to RE those half-megabyte .dlls without a decompiler (written in Delphi too).

A Quick Look on DLI

Thursday, May 26th, 2016

So yesterday I had a quick look on DLI image format. It turns out to be somewhat related to video codecs (and JPEG of course): there’s 8×8 fast integer DCT approximation, quantisation and bit coding of the block. And bit coding is the most interesting part really—this format employs binary model with old-school arithmetic coding and context selection for the model; coefficients are coded as first an array of coded coefficients flags (plus a flag for last coded coefficient), each non-zero coefficient has an additional flag to signal whether it’s larger than one and in that case it’s coded as unary code (bits coded with arithmetic coder of course).

And I still don’t like the notion of “let’s make our video codec I-frames into an image codec” (the reverse of “motion <image format>” is not much better but at least it makes sense for intermediate formats). Images and video codecs have different use cases and required features but I think I ranted about it once.

When Old Will Beat New Again?

Thursday, May 19th, 2016

Since my previous post hasn’t brought me answers I sought here’s another philosophical (i.e. no answers again) post on question that bothers me.

The concept is rather simple: some old tricks and methods become more appealing over the time when other more competitive methods lose their traction. So I often wonder when those old methods, approaches and tricks will become relevant again.

For instance, quadtree coding was not popular some time ago and yet we see it again in codecs where it handles blocks of smaller sizes inside some coding unit (ITU H.EVC, VP9, AOMv1—you name the codec). There’s similar story with vector quantisation—it still lives in some GPU-assisted form and is interesting again.

Now let’s talk about classical arithmetic coding. In the flow of time it was mostly supplanted by some variation of binary coding. But with the time binary coding becomes more and more unwieldy since you have to code bits with different contexts and you often don’t code bits per se but rather bits for some variable-length code for integers. So I wonder if the classical arithmetic coding may come to use again and return saner coding while still being faster? Of course one could point me to One Xiphophorus, the company that made the best VP3 encoder, since they’ve found this approach worked fine in Celt and should work fine in Daala (unrelated to them: is FFA1 still a thing?). But really, is CABAC/boolean coder still the coder(s) of future or we’ll see more interesting things from the past? And yes, I’m aware how rANS can be used for faster coding of probabilities and that ANS is used in VP10 experiments. But what about, say, better modelling with, say, order-10 contexts (or the ones that take parameters from both neighbour blocks and blocks upper in hierarchy into account)?

And another one is not related to my usual stuff but is still quite interesting. Will raytracing return again? From what I know the current way is to have lots of triangles, lots of textures, lots of crazy additional maps and lots of even crazier shaders. I believe it went this way:

  • let’s approximate everything by triangles and draw them;
  • simple colours are not good enough—let’s add textures;
  • not good enough—let’s add shading (like Gouraud or Phong);
  • not good enough—let’s introduce bump maps for better realism;
  • not good enough—let’s introduce light maps;
  • not good enough—let’s introduce computable shaders;
  • still not good enough—let’s render scene once, calculate different parameters from it, create new light/shadow/whatever maps, add them to the scene and rerender again;
  • you know what, it’s still not good enough—let’s …

(I don’t know much about computer graphics since our university course didn’t went much farther than Bresenham’s line algorithms and simple image formats)

With all this trickery you still have not achieved realistic picture especially when it comes to dynamic light, shadows and reflections. Yet during all this time there was raytracing which is simple as hell (and equally slow): you have a scene and for each pixel you simply trace its path until you end in some light source or simply give up. With massive parallelism of GPUs and complex shaders it looks to me that switching to raytracing might be easier (sure, there’s a problem of legacy, making all those developers switch from Magma and Vulcan to a new approach etc etc) but I still wonder if it makes sense from technical point of view or will make in the near future.

And as usual—I hope for the answers but I don’t expect that I receive any.

Schizophrenia in Open-source Projects

Saturday, May 14th, 2016

Disclaimer: the word “schizophrenia” is used here as it’s perceived by majority, not to denote a certain psychological condition. Feel free to be offended.

I’ve wasted about a decade working at two multimedia projects (plus a patch or two to unrelated projects) and what I’ve seen there leads me to the conclusion they both suffer from schizophrenia, albeit in different forms.

FFmpeg

FFmpeg features two forms of schizophrenia—developers and code.

Developers schizophrenia can be seen in how some developers believe they are also Libav developers. Mostly they brag because they’ve sent a patch or two to Libav and now can use it as a free review service. While I dislike Carl Eugen he’s at least honest and acts to his beliefs (here, an amended Elenril’s Law fulfilled; in case you’ve forgotten it says “Every FFmpeg-related discussion ends up mentioning Michael. Or Carl Eugen.”).

Code schizophrenia is more celebrated. The most prominent example is ProRes support—they offer two decoders and two encoders for it. There are two ASF demuxers as well. And two audio resampling libraries. And there are talks about adding second libswscale (*shudder*). The best part is that if you ask why it will probably go like this:

— Why do you have feature X in two versions?
— Because Libav has it.
— But why do you take it if you have your own version?
— To make merging Libav codebase easier.
— But why do you need to merge it at all?
— To make merging Libav codebase easier.

Please please tell me I’m wrong and provide proper reasons why FFmpeg keeps merging Libav stuff and keeps several versions of the same feature.

Libav

Here it’s somewhat more interesting—you have developers with physical multiple personalities disorder. Unlike the case in FFmpeg here you have people that work solely on Libav but as several different people. Most prominent examples are Luca (known as lu_zero and koda on IRC) who is really several instances not always agreeing with themselves (and if you subscribe to Lu_zianism then he’s also both Michael and Carl Eugen too, if you don’t believe it—you should because it annoys him/them). And there’s also Alexandra (aka beta elenril) and Anton (aka sasshka 2.0). And that’s the majority of core Libav developers anyway.

But at least the project seems to be more happy with itself and probably has a dream similar to the Ukrainian dream (which is “bugger off you damned Russians” in case you didn’t know).


It was somewhat fun to watch the fate of proposed bitstream reader replacement. Alexandra (she’s also Top Libav Blogger ?2 by the way—simply because she blogs) proposed a new bitstream reader to replace the old horror (which is a good idea), and that new bitstream reader turned out to be faster than the old mess too, and what was the result? If I’d be British I’d call it sheep-worrying.

Those developers from FFmpeg that believe they also should have a say in Libav process started to express their opinions. While there were independent benchmarking proving the new implementation is indeed faster (which is a good thing to provide) those benchmarks were also run on the decoder not present in Libav, with badly converted functions for code reading, and that turned out to have some problems because of the encoder used (also not in Libav) that produced nonconforming stream and screwed multi-threaded decoding benchmarks (that one can be seen as both trollish and arrogant—kinda like judging The Beatles performance from an excerpt sung by your not very talented neighbour).

But mostly it was bikeshedding and asking why it was not using old get_bits interface. The answer for the latter is simple—because it was built from horrible macros used in half of the places directly so you should either to make everything follow those macros design or convert the old UPDATE_CACHE(); LAST_SKIP_BITS(); ... CLOSE_READER(); into saner get_bits(); skip_bits(); anyway. And Libav developers decided it’s better to have fully new interface anyway and to make it consistent with bytestream reading while at it.

So why did people who should have nothing to do with it bikeshedded that much? Probably because they know in their hearts that as soon as it hits Libav the work on copying it into FFmpeg starts and sooner or later it will reside in FFmpeg codebase probably along with old get_bits.h with most decoders switched to the new bitstream reader anyway. Why? See the theoretical conversation above. I’d like to know the answer why merges are really done but I guess I’ll get it no sooner than this bitstream reader is accepted into Libav master (i.e. never).

On QuickTime Codecs

Saturday, May 7th, 2016

The amount of interesting codecs is dangerously low so I’ll probably stop writing about them at all (and that rises a question whether this blog should be kept alive at all).

So, scraping bottom of the barrel I come to QuickTime codecs.

There are two codecs from the standard QuickTime package that are yet to be implemented in opensource: QDesign Music and Apple Pixlet. The former is (obviously) an audio codec with simple tones+noise coding, I hope to document it soon. The latter is an intermediate codec based on wavelets, so it should not be that hard to RE. The main problem is that I don’t know where to find a decoder (and I’m too lazy to search for one actively). It’s said that the only version of QuickTime being able to decode it was on Mac OS X Panther (yes, not just when it was called Mac OS X but also when it was purely PowerPC only). I estimate this codec would be rather simple—on par with SMPTE VC-5 (and probably even without codebooks but rather with generic variable-length codes like in Pear Intermediate Codec and AmateurRes). And PowerPC assembly is not that bad after you get hold of rlwinm instruction, I’ve REd most of AIC from PowerPC binary after all.

And there are some third-party extensions even Compn doesn’t know about like NewTek SpeedHQ or Digital Anarchy Microcosm codec. The former is an ordinary DCT-based intermediate codec any koda can RE, the latter is somewhat funny lossless codec (funny because it uses range coder just to decode bytes and use them in 8- or 16-bit RLE) that is better left to Derek to RE. SheerVideo has been documented long time ago, ZyGo video was just another DiVX, VP3 and Indeo 4 have other decoders etc etc.

Life is boring.

Update: so there is a more modern Pixlet decoder. I’ve looked at it. There’s per-plane wavelet compression, parametrised Rice codes, everything rather trivial. The only interesting things are coding of the zeroeth subband (it’s splitted into first coefficient, top row, left column and all other coefficients coded with top+left prediction) and the fact they have subband header with magic 0xDEADBEEF. Nice touch!

Life is still boring though.