A Quick Look on DLI

May 26th, 2016

So yesterday I had a quick look on DLI image format. It turns out to be somewhat related to video codecs (and JPEG of course): there’s 8×8 fast integer DCT approximation, quantisation and bit coding of the block. And bit coding is the most interesting part really—this format employs binary model with old-school arithmetic coding and context selection for the model; coefficients are coded as first an array of coded coefficients flags (plus a flag for last coded coefficient), each non-zero coefficient has an additional flag to signal whether it’s larger than one and in that case it’s coded as unary code (bits coded with arithmetic coder of course).

And I still don’t like the notion of “let’s make our video codec I-frames into an image codec” (the reverse of “motion <image format>” is not much better but at least it makes sense for intermediate formats). Images and video codecs have different use cases and required features but I think I ranted about it once.

When Old Will Beat New Again?

May 19th, 2016

Since my previous post hasn’t brought me answers I sought here’s another philosophical (i.e. no answers again) post on question that bothers me.

The concept is rather simple: some old tricks and methods become more appealing over the time when other more competitive methods lose their traction. So I often wonder when those old methods, approaches and tricks will become relevant again.

For instance, quadtree coding was not popular some time ago and yet we see it again in codecs where it handles blocks of smaller sizes inside some coding unit (ITU H.EVC, VP9, AOMv1—you name the codec). There’s similar story with vector quantisation—it still lives in some GPU-assisted form and is interesting again.

Now let’s talk about classical arithmetic coding. In the flow of time it was mostly supplanted by some variation of binary coding. But with the time binary coding becomes more and more unwieldy since you have to code bits with different contexts and you often don’t code bits per se but rather bits for some variable-length code for integers. So I wonder if the classical arithmetic coding may come to use again and return saner coding while still being faster? Of course one could point me to One Xiphophorus, the company that made the best VP3 encoder, since they’ve found this approach worked fine in Celt and should work fine in Daala (unrelated to them: is FFA1 still a thing?). But really, is CABAC/boolean coder still the coder(s) of future or we’ll see more interesting things from the past? And yes, I’m aware how rANS can be used for faster coding of probabilities and that ANS is used in VP10 experiments. But what about, say, better modelling with, say, order-10 contexts (or the ones that take parameters from both neighbour blocks and blocks upper in hierarchy into account)?

And another one is not related to my usual stuff but is still quite interesting. Will raytracing return again? From what I know the current way is to have lots of triangles, lots of textures, lots of crazy additional maps and lots of even crazier shaders. I believe it went this way:

  • let’s approximate everything by triangles and draw them;
  • simple colours are not good enough—let’s add textures;
  • not good enough—let’s add shading (like Gouraud or Phong);
  • not good enough—let’s introduce bump maps for better realism;
  • not good enough—let’s introduce light maps;
  • not good enough—let’s introduce computable shaders;
  • still not good enough—let’s render scene once, calculate different parameters from it, create new light/shadow/whatever maps, add them to the scene and rerender again;
  • you know what, it’s still not good enough—let’s …

(I don’t know much about computer graphics since our university course didn’t went much farther than Bresenham’s line algorithms and simple image formats)

With all this trickery you still have not achieved realistic picture especially when it comes to dynamic light, shadows and reflections. Yet during all this time there was raytracing which is simple as hell (and equally slow): you have a scene and for each pixel you simply trace its path until you end in some light source or simply give up. With massive parallelism of GPUs and complex shaders it looks to me that switching to raytracing might be easier (sure, there’s a problem of legacy, making all those developers switch from Magma and Vulcan to a new approach etc etc) but I still wonder if it makes sense from technical point of view or will make in the near future.

And as usual—I hope for the answers but I don’t expect that I receive any.

Schizophrenia in Open-source Projects

May 14th, 2016

Disclaimer: the word “schizophrenia” is used here as it’s perceived by majority, not to denote a certain psychological condition. Feel free to be offended.

I’ve wasted about a decade working at two multimedia projects (plus a patch or two to unrelated projects) and what I’ve seen there leads me to the conclusion they both suffer from schizophrenia, albeit in different forms.

FFmpeg

FFmpeg features two forms of schizophrenia—developers and code.

Developers schizophrenia can be seen in how some developers believe they are also Libav developers. Mostly they brag because they’ve sent a patch or two to Libav and now can use it as a free review service. While I dislike Carl Eugen he’s at least honest and acts to his beliefs (here, an amended Elenril’s Law fulfilled; in case you’ve forgotten it says “Every FFmpeg-related discussion ends up mentioning Michael. Or Carl Eugen.”).

Code schizophrenia is more celebrated. The most prominent example is ProRes support—they offer two decoders and two encoders for it. There are two ASF demuxers as well. And two audio resampling libraries. And there are talks about adding second libswscale (*shudder*). The best part is that if you ask why it will probably go like this:

— Why do you have feature X in two versions?
— Because Libav has it.
— But why do you take it if you have your own version?
— To make merging Libav codebase easier.
— But why do you need to merge it at all?
— To make merging Libav codebase easier.

Please please tell me I’m wrong and provide proper reasons why FFmpeg keeps merging Libav stuff and keeps several versions of the same feature.

Libav

Here it’s somewhat more interesting—you have developers with physical multiple personalities disorder. Unlike the case in FFmpeg here you have people that work solely on Libav but as several different people. Most prominent examples are Luca (known as lu_zero and koda on IRC) who is really several instances not always agreeing with themselves (and if you subscribe to Lu_zianism then he’s also both Michael and Carl Eugen too, if you don’t believe it—you should because it annoys him/them). And there’s also Alexandra (aka beta elenril) and Anton (aka sasshka 2.0). And that’s the majority of core Libav developers anyway.

But at least the project seems to be more happy with itself and probably has a dream similar to the Ukrainian dream (which is “bugger off you damned Russians” in case you didn’t know).


It was somewhat fun to watch the fate of proposed bitstream reader replacement. Alexandra (she’s also Top Libav Blogger ?2 by the way—simply because she blogs) proposed a new bitstream reader to replace the old horror (which is a good idea), and that new bitstream reader turned out to be faster than the old mess too, and what was the result? If I’d be British I’d call it sheep-worrying.

Those developers from FFmpeg that believe they also should have a say in Libav process started to express their opinions. While there were independent benchmarking proving the new implementation is indeed faster (which is a good thing to provide) those benchmarks were also run on the decoder not present in Libav, with badly converted functions for code reading, and that turned out to have some problems because of the encoder used (also not in Libav) that produced nonconforming stream and screwed multi-threaded decoding benchmarks (that one can be seen as both trollish and arrogant—kinda like judging The Beatles performance from an excerpt sung by your not very talented neighbour).

But mostly it was bikeshedding and asking why it was not using old get_bits interface. The answer for the latter is simple—because it was built from horrible macros used in half of the places directly so you should either to make everything follow those macros design or convert the old UPDATE_CACHE(); LAST_SKIP_BITS(); ... CLOSE_READER(); into saner get_bits(); skip_bits(); anyway. And Libav developers decided it’s better to have fully new interface anyway and to make it consistent with bytestream reading while at it.

So why did people who should have nothing to do with it bikeshedded that much? Probably because they know in their hearts that as soon as it hits Libav the work on copying it into FFmpeg starts and sooner or later it will reside in FFmpeg codebase probably along with old get_bits.h with most decoders switched to the new bitstream reader anyway. Why? See the theoretical conversation above. I’d like to know the answer why merges are really done but I guess I’ll get it no sooner than this bitstream reader is accepted into Libav master (i.e. never).

On QuickTime Codecs

May 7th, 2016

The amount of interesting codecs is dangerously low so I’ll probably stop writing about them at all (and that rises a question whether this blog should be kept alive at all).

So, scraping bottom of the barrel I come to QuickTime codecs.

There are two codecs from the standard QuickTime package that are yet to be implemented in opensource: QDesign Music and Apple Pixlet. The former is (obviously) an audio codec with simple tones+noise coding, I hope to document it soon. The latter is an intermediate codec based on wavelets, so it should not be that hard to RE. The main problem is that I don’t know where to find a decoder (and I’m too lazy to search for one actively). It’s said that the only version of QuickTime being able to decode it was on Mac OS X Panther (yes, not just when it was called Mac OS X but also when it was purely PowerPC only). I estimate this codec would be rather simple—on par with SMPTE VC-5 (and probably even without codebooks but rather with generic variable-length codes like in Pear Intermediate Codec and AmateurRes). And PowerPC assembly is not that bad after you get hold of rlwinm instruction, I’ve REd most of AIC from PowerPC binary after all.

And there are some third-party extensions even Compn doesn’t know about like NewTek SpeedHQ or Digital Anarchy Microcosm codec. The former is an ordinary DCT-based intermediate codec any koda can RE, the latter is somewhat funny lossless codec (funny because it uses range coder just to decode bytes and use them in 8- or 16-bit RLE) that is better left to Derek to RE. SheerVideo has been documented long time ago, ZyGo video was just another DiVX, VP3 and Indeo 4 have other decoders etc etc.

Life is boring.

Update: so there is a more modern Pixlet decoder. I’ve looked at it. There’s per-plane wavelet compression, parametrised Rice codes, everything rather trivial. The only interesting things are coding of the zeroeth subband (it’s splitted into first coefficient, top row, left column and all other coefficients coded with top+left prediction) and the fact they have subband header with magic 0xDEADBEEF. Nice touch!

Life is still boring though.

Some Thoughts on Reuniting

April 28th, 2016

Before I move to the point I’d like to give some historical examples based on countries.

Ukraine

Well, as you remember in 1917-1918 there were several Ukrainian republics, most known are Ukrainian People’s Republic, West Ukrainian People’s Republic and Ukrainian Soviet Socialistic Republic. There were some other small states like anarchic republic but they are not relevant here.

So, Ukrainian People’s Republic and West Ukrainian People’s Republic willingly united in 1919 and that day is a national holiday now (later it was obviously occupied by Soviet Russia, Poland, Romania and Czechoslovakia). But why did that union happen? Because people wanted it and there was a dream for the united Ukraine since ages.

Germanies

You should’ve learned about it at school (or witnessed if you’re old enough). Why did the unification happened? Because people from both sides wanted it and the Soviet union could not prevent it any more.

Moldova and Romania

These countries share common history, have the same language and people like the idea of single country. While unification has not happened yet it might happen even in this century.

Chinese Republics

Here the situation is funnier. People’s Republic of China doesn’t recognize Republic of China yet they somehow co-exist and probably in distant future they will be the one again. Why? Because PRC is changing and it’s not what it had been during Chairman ? times.

Korea

Here the situation is even funnier. There are two governments who think they are the only True UpstreamKorean state, it’s just half of it is still occupied. And while there are constant talks about reunification, neither state really wants it. One country has suffered under homebrew Socialism (just look up what ‘juche’ means) for too long so it will take an enormous amount of time and money to make both parts equal (even funnier if you consider that before 1960s North Korea was industrially developed and South Korea was an underdeveloped agrarian region). Germanies got it easier (as a person paying Solidaritätszuschlag I know that). So will the reunion ever happen? I wouldn’t bet on it.


And now, to the our favourite projects.

Time from time somebody outside projects or from FFmpeg side asks about projects reunification. There are talks about it at VDDs. And yet there are no results. Have you noticed that I mentioned no such talks initiated by Libav. Why? Probably because Libav does not want to merge back. And there you have it—reunification cannot happen peacefully because you don’t have majority on both sides wanting it.

And that raises two questions: why FFmpeg wants reunification and why Libav doesn’t want it (or as a single question—what prevents it).

It seems that for some reason not clear to me FFmpeg keeps merging all stuff from Libav (feel free to enlighten me, otherwise FFmpeg developers themselves might forget it and it’ll turn into tradition) and having both projects together will solve two problems: the need for merge and the lack of skilled developers (it’s always the issue).

What does Libav gain from the merging? Relief from constant merges? Unlikely since it’s not being done there. More developers? That’s nice but Libav project seems to be happy as is. Return to the known brand and distributions? See above and here.

Let’s assume the projects decided to play nice out of nowhere and please people who’d want them to reunite. What would happen then? Multiple discussions about development process (that lead to the split in the first place), including but not limited to: reviewing process (relaxed and not applicable to some people or mandatory for any change), code standards (especially formatting), what features to have in the united tree (flat history or merges, one native decoder for certain format or two, use the code snippet like it was done in FFmpeg or in Libav). And on this stage it will all start to fall apart again.

So there you have it: clash of different development ideologies and more benefits for one side than the other. Also it’s rather hard to force people to work on the project they don’t like (and now they can choose at least).

And since this discussion cannot avoid certain names, here it is: I believe that Carl Eugen Hoyos deserves to be the next FFmpeg leader. Obviously my opinion doesn’t matter there and I could not convince anybody at VDD’15 but I firmly believe so. He’s the one with passion for the project, he cares for codec support (even fringe formats), he likes to follow guidelines, he respects Michael and is unlikely to go and ruin what he created. And at VDD he looked kinda like the most responsible adult too, so he can be the project face. Again, this is merely my opinion that won’t change anything.

Sincerely yours, NihAV project developer (it’s still vapourware, thanks for asking).

Some Information on Micronas SC4 and VoxWare MetaSound

April 24th, 2016

So I’ve looked at them.

Micronas SC4 seems to be rather unusual as it seems to bring elements of LPC to ADPCM. So it’s not just the old conventional “get nibble, multiply by step, output prediction, update index and step values”—it keeps a history of last 6 decoded samples and predictions and use them to calculate a new prediction value. Details might appear in the Wiki one day.

VoxWare MetaSound is three families of 2-3 codecs bundled under the same brand. I’ve not looked at technical details but they seem to have lots and lots of tables with floating point numbers (or just a bit of tables if you’ve looked at MetaSound first).
Here are the codecs:

  • RT24 2400bps “Real-Time” codec (ID is VOXa)
  • RT28 2844bps “Real-Time” codec (ID is VOXh)
  • RT29 2978bps “High Quality” codec (ID is VOXg)
  • VR12 1260bps Variable Rate codec (ID is VOXb)
  • VR15 1537bps Variable Rate codec(ID is VOXc)
  • SC3 3200bps “Embedded” codec (no ID)
  • SC6 6400bps “Embedded” codec (no ID)

Ask for support by grabbing j-b and demanding it to be supported. I know there are other players beside VLC but that’s the only project advertising that it “plays it all” even on T-shirts. It’s time to be responsible for your own words. And ask for Bink2 too while at it.

General Thoughts about Reverse Engineering Speech Codecs

April 23rd, 2016

Spoiler: they are not nice.

Speech codecs are probably the worst from my REing point of view. Why? Not because they are particularly hard to RE but rather because they are unpleasant. Here’s my list of reasons:

  • They are math-heavy. Of course you need to know mathematics to understand most of the codecs but I had no DSP courses at the university yet I can understand how video codecs work even in fine details, the same with many audio codecs. With speech codecs I have only general ideas about how they work.
  • Even worse, there are hardly any conventions on how to do things and in result codecs are built in a process that puts designing ARM SoCs to shame.
  • Even worse, because codecs are math heavy they have to be implemented with efficiency in mind which results in horrible fixed-point math usually in 16-bit variables.
  • And because all of this wasn’t enough, if codec supports several bitrates it might have additional postprocessing functions for lower bitrates in the best case. In the usual case it’s a different codec.

And that all is the source of REing problems. Bitstream format is usually easy to find and parse, the functions that do something with it are not easy to understand at all. I often end up not understanding what the function does let alone what concept it implements. I might recognize only some of them like LPC filter.

So why are speech codecs so badly designed? In my opinion it comes from the design decision. The initial idea was to use as little bits as possible by having a synthetic model and transmitting only its parameters. It worked great (at least compared to MPEG-4 with its synthetic scene and audio description aka the key parts of standard that people pretend do not exist at all). So you have human throat which is basically a variable tube and vowels are modulated tone, consonants are modulated noise. Transmit filter coefficients (original LPC, LSF, parcor form or something else), noise flag and pitch frequency and you’re done, right? It works fine for some sounds but the quality is not that great and it fails with some sounds completely (the sounds often used by French for example, so there’s still a need for French Speech Codec or j-bc for short). How to improve it? By adding impulses to “excite” the model (i.e. tell when to start/stop the sounds). Not good enough? Add pitch tilting! Still not good enough? Add postprocessing filter (and it was mandatory there long before video codecs). And what if we want to code not just voice and higher frequency range audio too? Well, add…

And thus it became pile of hacks over pile of hacks with a side dish of hacks. And each stage can be done in several different ways which only adds to confusion. That’s not even starting to talk about smart ways to save bits by splitting frame into several subframes and omitting coding some information for selected subframes (it can be interpolated from other subframes information after all). Or using codebooks and vector quantisation. Or how to generate noise for silent frames. Or using better coding than just writing fixed amount of bits for every element. Or…


I’ve finished looking at Lernout&Hauspie CELP+SBC codecs (as usual I don’t understand most of the things they do there but maybe I’ll still document them) and this plus my past experience made me write this post. Next is VoxWare MetaVoice and maybe Micronas SC4. And something saner afterwards. Or maybe it will be the usual nothing.

On Opensource Projects Support

April 22nd, 2016

Today I’d like to talk about how opensource projects are supported by “community” on example of FFmpeg/Libav.

Obviously I’ve chosen this example because I know some facts about internal politics and the fact libavcodec is de facto multimedia decoding standard used on every platform and by most multimedia processing tools out there.

What good did it bring to the project(s)? Some fame but that’s probably it—the largest users don’t even bother to acknowledge in public that they use it (cough, BaidUTube, cough). There’s an enormous amount of code (that serves as a good compiler suite too) but it’s maintained mostly by volunteers and people who have to use it at work (as they were hired because they worked on it in the first place). That’s it: the best material gain is an employment because you’ve showed off your skills or you can take occasional consulting work with varying quality of tasks and pay. Some people are paid to improve or write new decoder. Some are hired to work on improving protocol support and hardly paid at all (true story). Some simply sigh at yet another “please implement this for my app” mail. That’s good but where are the money to pay for task the project itself (e.g. refactor old horrible code, add tests, implement new feature or fix some old problem)?

There was an attempt to set up a foundation to gather money and use them for the project but it didn’t work well even before the split and got completely derailed after; and of course it was not a good idea to set it up in the USA since IRS refused to recognize it as non-profit organisation as other open-source projects have experienced (including X.org). The best part is how much money it could raise—I don’t remember the actual sum but it was relatively low like less than $20,000 (please correct me if I’m wrong) and it came mostly from caught (L)GPL violators IIRC.

Let’s take a successful opensource project that uses libavcodec, that would be VLC. For example, last VideoLAN Developer Days definitely costed tens and tens of thousands ${proper_currency} — for accommodating about hundred of people, reimbursing (at least some of) their travel costs, food etc etc; the event was sponsored by the largest Internet advertising company, the largest French advertising company and the largest French VideoLAN advertising company. I remember talks that they could even employ one developer full-time. And would VLC be any useful without libavcodec? And while they probably are the biggest opensource supporter of FFmpeg (they host their Git repository after all) and their developers write some code time from time they hardly do anything else—there is a bounty program but it’s a complete joke since it lists mostly done tasks nobody will claim reward for (and despite me pointing them to that fact nothing has been done). And obviously it doesn’t have tasks that would be beneficial for FFmpeg/Libav but not for VLC directly.

Let’s take a look at some commercial user. All those video hosting sites are available mostly because there’s enough bandwidth to stream video and because there is free code to support most of the formats uploaded by users so they don’t have to bother about details. And that free code is… (no points for guessing). And the biggest video hosting site was mentioned above. They don’t provide any support nor even acknowledge that they use certain opensource projects. Of course one can point to Baidu Summer of Code program but it’s for involving students into opensource (and probably finding fresh meat for themselves) and it doesn’t work that good for projects—you mostly have students willing to get money and/or credit for résumé. They tend to do the task and disappear completely. I’m sure that projects would prefer to have financial freedom instead to sponsor specific tasks that could be undertaken by anyone (not just students) and last whatever time is needed (not just summer). If the project gets some support from large companies it’s usually because a developer from that project working in that company convinced the management to do so.

Even worse is the situation with distributions because they tend to demand free technical support from you: fixing your own bugs, fixing other projects that use your code and such. What do you get in return? Nothing. The recent Debian and Xscreensaver messup is not some special example, it’s too typical.

It’s funny how the best sponsor for Libav might be Lu_minem.it—a small Italian company whose founder is Libav developer and thus knows what the project needs.

How do I fit into all of this? I was FFmpeg and then Libav developer, later a multimedia related work has found me because of my work on decoders (I’d been trying to find a good job myself but failed and had to accept the best offer I got). I was around core developers of the projects and thus could learn the facts written above. What have I got beside developing experience, acquaintances and pessimistic outlook on life? BSoC 2006-2009 participation (where I managed to finish only one project in time really despite the formal passing of it) plus some smaller sum of money for rewriting a component of swscale to make it LGPL. And some free dinners from VideoLAN foundation. So I’m fine but seeing that the project cannot afford paying a developer for some internal project (like writing a new RealMedia demuxer) is still very sad.

On multimedia player names

April 17th, 2016

Warning: if you do not recognize names mentioned here you might be too young.

I’ve been using computer for about nineteen years and during that time I tried various players for various formats. My curiosity for internal format design made me search for information about compression methods, source code for decoders and such. So it led me to the current state (doing nothing). Yet for the many multimedia players I know there are some naming issues and that’s what I want to talk about.

My first versatile player was PLAYSND.COM by Yuri Tumarin. This 13kB DOS program could play a lot of various sound formats like WAV, VOC, MIDI variants and Adlib tracker music (RAD, HSC and lots of other variants). The best part is that it played some compressed WAV files too (various ADPCM variants and more). Excellent tool but the name is too bland and hard to search for.

Speaking of Adlib tracker formats, there’s an opensource player with Adlib emulator supporting lots and lots of them. The problem with it (beside being outdated now)? It’s called adplug. A good name to be blocked by a generic rule!

Let’s move to video players.

My first Linux player for VideoCDs was MpegTV. It was a commercial program but again, it was a country where nobody bothered about piracy and it was the only player on Linux I knew that could decode VideoCDs without stuttering. The player was doing its task fine but its name is rather cringeworthy.

Then I found out about DVD-oriented players like Ogle and Xine. Good names. And I still use Xine sometime when I need to play DVDs.

And the golden standard of multimedia on Unix systems—XAnim. The only bad thing is that the last time I checked it didn’t work correctly in 32-bit X11 mode. But it did its job well and I’m still grateful it exists (and also its binary plugins were good binary specification for missing codecs).

And there’s MPlayer. It was fast, it had many useful features and codecs supported (I still use it as a testbed for running some 32-bit VfW/DMO codecs when I cannot write a decoder without debug) but its codebase was horrible (in some cases legendarily horrible) and the name is both bland and reminds of mplayer32.exe (which crashed and hanged a lot too).

And one of its forks is named after one of the horrible chunks in libavcodec and its author operates under pseudonym. So was it really worth it to name the player MPV?

And I conclude this review with a well-known multimedia player that I won’t use. When I think about VLC the first meaning coming to my mind is variable length codes. And when the only good thing about your player name is the number of puns you can make I’d use something with more decent name thank you very much.

TwilightMotion Saga: The End

April 17th, 2016

I’ve finally documented what I know about VP4 in the wiki and I should unload it from my memory. Implementing decoders and such is left as an exercise for TrueMotion-loving reader.

Probably I’ll look at ClearVideo (for the N-th time) or some speech codec suite. Funny thing is that even if they market it as a single speech codec you have a good chance to find several codecs for different bitrates (like for Lernout & Hauspie you have CELP for 4.8 kbps and SBC with different parameters for 8, 12 and 16 kbps) and don’t get me started on VoxWare MetaSpeech (don’t confuse it with MetaSound—that one is not a speech codec or with MetaAudio—that one doesn’t exist), that’s the rant for another day.