Archive for the ‘Useless Rants’ Category

Freudian Slip?

Saturday, November 7th, 2015

Even if I’m no longer Libav or FFmpeg developer I still look at both projects’ development mailing lists (on FFmpeg’s one mostly in faint hope that Peter Ross submits anything awesome again).

So one day I see this message. The “former” leader calls a large share of commits they get “enemy merges” (and it cannot be humorous, it’s not mean enough to be Austrian humor). Well, nice attitude you have there. And you know what? This might be a semi-official position there.

I was present at FFmpeg-Libav discussion at VDD (since I was not noticed by Jean-Baptiste I remained while other outsiders were kicked out — here’s the recording of public part). There I even managed to ask a single question — what’s really changed since Michael’s resignation. FFmpeg people failed to answer that. Beside not making merges anymore Michael still announces and makes releases and does whatever changes he likes without reviews; he’s still a de facto leader in my opinion. I’m yet to see FFmpeg having defined rules stating something different (even Libav has something). Another fun fact from that meeting was some FFmpeg people openly stating they hate Libav merely because it exists.

Again, I don’t have to care about FFmpeg community but working at Libav in such conditions is no fun either (and it’s no fun for many other reasons many of which sadly have something to do with FFmpeg).

So I’d rather follow the advice from the great philosopher Eric Theodor Cartman — “screw you guys, I’m going home”. Developing NihAV at slow pace (i.e. when I feel like doing it) in a neutral one-developer atmosphere is much better.

Rants on Data Compression

Friday, October 9th, 2015

… When I was a young piglet I liked to read the rather famous paper by Bell, Cleary and Witten discussing general data compression and PPM. The best phrase there was that the progress in data compression is mostly defined by larger amounts of RAM available. I still believe those words to be true and below I present my thoughts on current state of data compression. Probably it’s trivial, well-known, obvious or wrong to anybody knowing a bit about data compression but well, it’s my blog and my discarded thoughts dumpster.

General data compression

Let’s start from the very end — entropy coding. There are two approaches: coding into integer amount of bits or coding as close to Shannon’s entropy limit as possible. For both we have been having optimal coding methods for about half a century (Huffman coding — 1952, arithmetic coding — mid-1970s). You cannot improve compression ratio here, so the following schemes are mostly tradeoffs sacrificing a bit of compression for speed gains (especially in form of (pseudo-)arithmetic coders operating only on binary). The only outstanding thing is so-called Asymmetric Numeral Systems but I suspect they are isomorphic to traditional entropy coders.

Now about let’s look at what feeds data to entropy coders. There are two main approaches (often combined): context modeling (probably the real foundation for current highest compression methods — PPM — was proposed in mid-1980s) and LZ77 (guess the year yourselves). Are there improvements in this area? Yes! The principle is simple — the better you can predict input the better you can code it. So if you combine different methods to better handle your data you can get some gains.

And yet the main compression gain here lies in proper preprocessing. From table or executable code preprocessing (table data usually differs only a bit between entries and for executables you can get some gains if you replace jump/call addresses with absolute values) to Burrows–Wheeler transform plus move-to-front plus RLE if needed etc.

Audio compression

You have four main targets here: general lossy compression, speech compression, lossless fast compression and lossless crazy compression.

General lossy compression follows the scheme established in 1990s or earlier: transform to frequency domain, grouping frequencies and coding frequency bands. Most of the methods are quite old and progress is defined mostly by how much RAM and CPU users are willing to sacrifice on it. For example, Celt (main part of Opus; the other part, SILK, is an ordinary speech codec) is not that much different in design from G.722.1 from late 1990s.

Speech coding follows canons from 1980s too — performing LPC, coding filter coefficients and other information enhancing signal reconstruction (pulse position, pitch tilt etc.).

Lossless fast compression (aka for normal usage) follows the suit too — you have LPC or some adaptive filters used for prediction plus residue coding (usually with Golomb/Rice codes from 1960s-1970s, BTW the original Golomb paper is AWESOME, they don’t write papers like that nowadays).

Lossless crazy compression (aka spend hours compressing it and as much for decompressing) employ the same suit but they have longer filters and usually even several filters of different size applied each after another plus better residue coding schemes.

Image compression

Here you have more variety of coding methods but most of them are very old (just look when Haar wavelet was proposed). Especially funny is that JPEG is still holding strong despite being more than twenty years old. I still remember so-called fourth generation image compression (separating image into region borders and textures to fill them and coding those), it didn’t lift off yet despite being introduced in late 1980s or so.

The only interesting development happens in lossless image compression but neither 2-D LZ77 (WebP) nor context modeling (FLIF) are particularly new ideas.

Video compression

Modern codecs are all so similar and they are usually ripoffs of H.26x (there are two exceptions — Thor, which is not a ripoff just because it was designed with openly acknowledging that some parts are taken from H.265, and Daala, which is more original and it’s discussed below).

So nowadays you have a very limited subset of ideas that were present in video codecs from 1990s — it’s boring macroblocks (now with quadtree partitioning instead of fixed size), motion compensation (now you have more reference frames to choose from though) and binary entropy coder (except for Thor, it went the way of RealVideo 3/4 with context-adaptive VLCs). Even the trend of adding special coding tools for special content doesn’t look that original (if you remember countless screen codecs and MPEG-4 Audio, *barf*).

The only exception for now is Daala that uses more original ideas but I fear it will end the same boring codec because it is not crazy enough to make a breakthrough. I believe it should do more crazy preprocessing at least and maybe better modeling, e.g. taking more than nearest neighbours into account (maybe even use something PPM-like for element coding and not just probabilities mixing). Look at JBIG for inspiration maybe 😉

Conclusions

Don’t expect miracles in data compression to happen anytime soon but couple of percent improvements for specialised fields at least in a decade is possible and even expected.

FAQ

Saturday, October 3rd, 2015

Since I’ve been asked the same questions over and over again I’ve decided to make a short (for now) FAQ page.

  • How many years does it take to get a citizenship in Germany? 7-8 years.
  • How long have you been living in Germany? Since Spring 2010, do the math yourself.
  • So you’ll get your German citizenship in a couple of years, right? Maybe. It’s the same kind of maybe as in ‘Berlin-Brandenburg airport will be open in a couple of years.’ And it does not depend on me much.
  • Can you help me with ProRes issue … I can but I have no desire nor obligations. All Trocadero I got writing an encoder is gone long time ago and I don’t participate in projects that offer any ProRes support, inquire there.
  • Can you look at this codec … I can but no promises — I rarely have a desire to do anything these days.
  • Is NihAV real? More or less, it still lacks a lot of design and code but there are some bits implemented already. Design is described in this blog when it appears, code is developed as who-cares-source.
  • Why do you blame lu_zero? Oh, there are so many reasons for that and new ones keep appearing almost every day. Mostly it’s for the things he was supposed to do but still hasn’t done (and unlikely to do in foreseeable future): AVScale design and implementation, writing blog posts on certain topics (often I end writing them, which is yet another reason to blame him), not doing much about ASF or RealMedia demuxers and related delayed work, for personal stuff (like preventing me trying Torino trams and underground), for missing technical stuff in a wiki. Oh, and for being at least two different persons. There’s more that I can’t remember right now.
  • When will you visit Pelh?imov? Dunno, maybe when I have more than three free days.

Random Thoughts on Format Design Process

Wednesday, August 12th, 2015

From my experience a lot of codecs have some wrong things in them and those things are usually introduced during codec creation. As for containers, I’ve expressed my opinion before.

It is very bad when some codec is being developed and then suddenly it’s declared released. You’re left with a pile of code that has somehow evolved into current shape and probably even the author has no idea how it works. Two examples — Snow and Speex. The first one is wavelet-based codec that performed quite well back in DiVX 3/4 days, the other one is a speech codec that also gained some popularity and was even included as one of Flash audio codecs. So the codecs by themselves should not be that bad but there’s only one implementation and no specification. There were several attempts to make Snow developer write a specification for it (for money!) but he always refused. FFV1 is faring somewhat better since it has some rudimentary specification and hopefully standardisation efforts will bring us independent implementations and full specification (yes, I’m an idiot optimist). What would be a proper way to design a codec in my opinion? Create test version, play with it till you achieve good result or release a known beta, write specification, throw away old code and reimplement version 1 from scratch. Repeat for version 2 etc.

I think I’ve complained before that this situation is very common with proprietary codecs too. They have inhouse encoder and decoder implementation with encoder bugs compensated in the decoder. Stupid motion compensation in RealVideo 4 is one of those “features”. Or pre-RTM WMV9 with its block pattern coding though it’s supposed to be beta anyway.

There is even worse case — when codec author decides to embed all development history into decoder maintaining backward compatibility. The worst offender is Monkey Audio with its subtle bitstream changes at every version and having two dozen versions. Another “good” example is HEVC with its ever-changing bitstream format. Different major versions of reference software introduce serious bitstream changes, like HM8 -> HM10 transition remapped all NAL IDs. IIRC superseded version of ITU H.265 was for 4:2:0 subsampling only. Honestly, I shan’t cry if this codec dies because of idiotic licensing terms (and maybe it should really be contained only in FLV). Speaking of HEVC idiocies, VP9 got new features in new profiles including 4:4:0 subsampling. In my opinion one should kill this creeping featurism especially if you don’t have proper profiling/versioning system and even them introduce new features sparingly.

At least there’s still hope for Daala to be developed properly.

Springtime for H.265 clones!

Wednesday, July 15th, 2015

Previously I feared there won’t be any H.265 clones beside VP<git-experimental> codec but luckily I was proved wrong.

There’s the second announcement of Really?Networks RMHD, intended for China (RealMedia was popular there after all). Either it’s their completely new codec (NGV) that has finally buffered 100% based on some original ideas or it’s H.265 ripoff. I’d bet on the latter.

Second, I’ve finally read a book describing upcoming AVS2 (again, intended for China and being a Chinese standard). Well, if the first paragraph describing it has such abbreviations as CU, PU and TU you may be sure it’s an original codec that has nothing to do with H.265. Coding concepts like variable block transform, splitting motion compensating block unevenly and having 34 intra prediction modes — those concepts are completely original and are not used anywhere else for sure. Of course there’s some Chinese logic involved in some decisions and thus codec has such gems ripped off HEVC like coding motion vectors in integer precision instead of quarterpel if they exceed certain limit or coding coefficients in zigzags of 4×4 blocks or having special treating for 64×64 blocks (this block is downscaled first and then transformed with conventional 32×32 transform — and they call it Logical Transform BTW) or special motion vector prediction mode for F-frames.

But that’s not all — they’ve introduced special “scene coding”. It relies on G-frames or GB-frames that contain scene background and it may be not displayed (who said VPx?!), and S-frames contain foreground motion. Though I’m pretty sure one can emulate it using H.265 features too, maybe longrefs plus no_display flag. I’m also pretty sure that if HEVC lacks some coding approach for now it will be added soon as a special extension (at least what I’ve read in screen coding extension looked completely logical — like a saddle as one of car seats).


Now I can be sure at last that codec future is looking good.

UPD: And there’s Cisco Thor now as well (simplified HEVC with VLC instead of CABAC). It does two things simultaneously — expands H.265 ripoffs family and borrows more from H.264. Now the only thing missing is Sorenson SVQ5 (or Double Spark or whatever name they want to give it).

On Greece

Sunday, July 12th, 2015

I see too much bullshit about Greece in Internet these days, so much of it that I could not refrain from writing this post.

First of all, I come from a country with even worse economical situation (fun fact — the former Ukrainian ostrich supportedpresident complained how hard it’s to repay debts on his visit to Greece during the first Greek debt crisis). Unlike Greece most of people got no money from government, companies had large tax burden (in the latter years the government decided to press companies to pay taxes in advance and in amount decided by the tax inspection, tax returns working only for selected companies), lots of debts that went to no good purpose…

But enough about similarities between countries (certain Italians are not happy about similarities between Ukraine and Italy either), let’s get to the bullshit statements.

It’s not their fault. Of course it is, they had to forge their financial statistics under gunpoint in order to join and remain in Eurozone. Of course they share blame with Eurobureaucracy that wanted to extend EU even with a Greece and was willing to overlook their faults in order to keep it. Yet active part had been done by Greek government — it’s easy to buy voters with borrowed money that somebody else has to return in the future (in other words — not our problem). Another point of tension is Schengen area membership: because of good border control they have a lot of illegal immigrants and that’s what EU needs, hopefully when some neighbouring lands will connect Greece to the rest of Schengen area it will bring joy to everyone, especially to the UK.

The whole world is in debt to Greece for their achievements in culture and science. First of all, that sounds like typical copyright. “My grandfather once wrote a song that was played on a radio, I deserve not to work ever in my life.” (some Slashdot comment as I remember it from a decade ago or so). Second, most of the current countries have nothing to do with the nations that were on that territory a thousand or two thousand years ago. Look at Arab Republic Egypt — there was nothing Arabic in the people who built pyramids, temples and sphinxes. If you believe David Ben-Gurion’s thesis, then Palestinians are true Israeli people who lost their culture because of Arab conquests — they seem to oppose their original religion even to this day. Same story with Balkan nations and Ottoman Empire: modern Greece has nothing to do with the ancient Greece except in territory (say hello to Macedonia) and similar language. So, nice knowing you but don’t claim the old history to yourself; and while I’m grateful for those past achievements, they are not yours. I’d been living in a country that tried to exploit that (mostly in form of Soviet legacy and what colloquial “they” did for everyone), no thanks.

LZ77-based compressors — a story similar to lossless codecs

Tuesday, May 12th, 2015

What do LZ77 compressors and lossless codecs have in common? They are both perform lossless compression and there are too many of them because everyone tries to invent their own. And like lossless audio codecs — quite often in their own container too.

In case you don’t know (shame on you!) LZ77 scheme parses input into pieces like <literal> <copy> <literal> ... Literal means “copy these input bytes verbatim”, copy is “we had that substring some time ago, copy N bytes from the history at offset M”.

The idea by itself is rather simple and thus it’s easy to implement some LZ77 parsing with the following coding, slap your name on it and present as some new algorithm. There are three branches of implementation goals there — fast (but somewhat decent) compression, high (but not so fast) compression and experimental research that may lead to implementations in the first two branches.

Fast compression schemes usually pack everything into bytes so no time is wasted on bit reading. Usually format is like this — if top three bits of the next byte are something, then read literal copy length, otherwise determine offset size, read it and copy string from the dictionary. Quite often there are small tweaks to make compression faster (like using hashes) or slightly better (using escape values to code long values and coding small offsets/lengths into opcode etc.). There are so many implementations like that and they still keep appearing. LZO, LZF, FastLZ, snappy, chameleon… And lots of old games used such compression for their resources (including video) too.

High compression schemes use much better compressing of the data produced by LZ77 parsing and spending more cycles on finding the best parsing of the input. It all started essentially with LZHUF when someone decided to employ Huffman codes instead of writing values in a fixed amount of bits. If you’ve never heard about LHA/LZH you need your Amiga box confiscated. This approach reached its peak with Deflate — by modern standards it’s not the best format to compress (i.e. not fast enough, does not compress high enough etc etc.) but it’s the standard available everywhere and in any form. Deflate uses custom per-block Huffman codes with their definition stored in compressed form as well so there’s hardly anything to improve there radically. And thus (patent expiration helped greatly too) another form of LZ77-based compression started to bloom — LZA (using modelling and arithmetic coding on LZ77 parsing results). Current favourite LZMA (and main RAR compression scheme) uses this approach too albeit in very sophisticated form — preprocessors to increase compression ratio on some kinds of known data, Markov models, you name it.

And here’s my rant — leave Deflate alone! It’s like JPEG of data compression — old and seemingly not very effective but it’s ubiquitous, well-supported and still has some improvement potential (like demonstrated by e.g. 7-zip and zopfli). I hate it to have as many compression schemes to support as video codecs. Deflate and LZMA are enough for now and I doubt there will be something significantly more effective appearing soon. Work on something lossy — like H.265 encoder optimisations — instead.

Some Travel Notes

Monday, May 4th, 2015

So I’ve finally visited the disunited state of Austria-Hungary and can share some feelings for those who like to read my travel notes (all zero people).

First, I’d like to talk about rail magazines that are present in InterCity or express trains in different countries. The ones I know are issued monthly and have national peculiarities (for starters, they are written in the national language). The one from Deutsche Bahn (German railways) covers a lot of different topics — culture, travel, some short story or an excerpt from one, DB plans, kids corner etc. ÖBB (Austrian railways) one is mostly dedicated to advertising Austria for tourists (and maybe a bit or two about neighbouring resorts to visit). TGV magazine (obviously French) is something in-between (not fully advertisements but not much serious stuff either) plus advertisements for night clubs. Yet it’s the only one of three that features a scheme for IC and TGV routes. And the best one is of course Kupe from SJ (Swedish railways). It has articles on various topics and it also includes things close to my heart: a full map or Swedish railways (I need to travel more there!), SJ fleet description (I like to ride all those kinds of trains plus Inlandsbanan’s Y1, SL X60 and X10 and I definitely need to go to Lennakatten again!) and the most important thing — a page where locomotive driver (it was Peter and now Jenny) answering railway-related questions (e.g. what’s the difference between trains like X2 and X40, what’s the longest route they have to travel, why train goes slowly sometimes etc.). Anyway, back to actual travel.

For Hungarian part I’ve visited Budapest. If you ignore the river, buildings in the centre and people it looks and feels like Kharkiv. The same neglected buildings (often in the same architectural style), the same neglected streets. The transport is verily the same — Tatra trams, Ikarus buses, even underground rolling stock is the same and even painted the same! Heck, even most people I talked with there were from Kharkiv. And their suburban rail lines (like H5, H6 or H8/H9) are shaky as Ukrainian roads.

Also as I’m, to speak politically correct, a fat cripple I really appreciated how lines are connected there — you often have to cross a road or use an underground pass without any elevators. Tram routes are so well designed that they simply end somewhere in the middle of the street with no loop to turn around. And the airport reminds of Kharkiv too — it’s connected only by a bus (on an Ukrainian-grade road), they check your documents thoroughly. The only difference that in Kharkiv airport I had never had to take off my shoes on security check. At least after visiting it I don’t have a desire to go back to Ukraine (not that I had it before…).

Austrian part is represented by Innsbruck. It’s a stereotypical town in Austrian Alps. Transport system is rather strange — trams have numbers like 1, 3, 6, STB and buses have numbers like D, H, LK, O or TS. For skiers there are Alps with funiculars all around the town, for idiots who believe that fake should cost more than real there are tours to Swarovski, for me there was a museum of local rail lines (that means both local trams and railways in different part of Tirol including Italy). Museum ticket also gives a right to get a ride on museum tram around the town. While the museum by itself is small (only two rooms with mostly photos and plans) it also has a depot full of museum trams from probably 1920s to 1970s (that feeling when you see DÜWAG GT6 only in a museum while they are still common here). Two tram lines (6 and STB) go into the mountains, at least STB being one-track there with passing loop on some stations (and trams take left track there like on proper railways). One of those stations surprised me by having an emergency broom tied to the pole there.

It’s also worth noting that there are two rivers flowing through Innsbruck — Inn, obviously, and Sill. I don’t care what it means for them, I know what it means for me — salt water herring in Swedish and that’s what I was thinking about.

Overall, Innsbruck looked nice and a bit like Bavaria, I honestly expected it to be worse (mostly because of Austrians I know). And understanding German is much easier than understanding Hungarian unless you’ve been born one. It’s worth visiting again sometime.

NihAV — A New Approach to Multimedia Pt. 5

Saturday, April 25th, 2015

Structures and functions

The problem with structures in libav* is that they are quite often contain a lot of useless information and easily break ABI when someone needs to add yet another crucial field like grandmother’s birthday. My idea to solve some of those problems was adding side data — something that is passed along the main data (e.g. packet) and decoders don’t have to care about it. It would be even better to make it more generic, so you don’t have to care about enums for that either. For instance, most of the codecs don’t have to care about broadcast grade metadata (but some containers and codecs like ATSC A/52 provide a lot of it) or stupid DVD shit (pan&scan anyone?). So if demuxer or decoder wants to provide it — fine, just don’t clutter existing structures with it, add it to metadata and if consumer (encoder/muxer/application) cares it can check whether such non-standard information is present and use it. That’s the general approach I want to have quite similar to FCC certification rule: producers (any code that outputs data) can have any kind of additional data but consumers (code that takes that data for input) do not have to care about it and can ignore it freely. It’s easy to add options marked as essential (like PNG chunks — they are self-marked that you can distinguish chunks that can be ignored from those that should be handled in any case) to ensure that this option won’t be ignored and input handler can error out on not understanding it.

As for proper function calls — Luca has described it quite well here (pity noone reads his blog).

NihAV — A New Approach to Multimedia Pt. 4

Friday, April 24th, 2015

On colourspaces and such

I think current situation with pixel formats is brain-damaged as well. You have a list of pixel formats longer than two arms and yet it’s insufficient for many use cases (e.g. Canopus HQX needs 12-bit YUVA422 but there’s no such format supported and thus 16-bit had to be used instead or ProRes with 8- or 16-bit alpha channel and 10-bit YUV). In this case it’s much better to have pixel format descriptor with all essential properties covered and all exotic stuff (e.g. Bayer to RGB conversion coefficients) in options. Why introduce a dozen IDs for packed raw formats when you can describe them in uniform way (i.e. read it as big/little-endian, use these shifts and masks to extract components etc.)? Even if you need to convert YUV with different subsampling for chroma planes (can happen in JPEG) into some special packed 10-bit RGB format you can simply pass those pixel format descriptors to the library and it will handle it despite encountering such formats for the first time.

P.S. I actually wrote some test code to demonstrate that idea but no-one got interested in it.