Archive for the ‘Useless Rants’ Category

NihAV — A New Approach to Multimedia Pt. 1

Thursday, April 23rd, 2015

Foreword or Why?!?!

There are two curses in program design (among many others) — legacy and monolithic design.

Legacy means two things: first, there is such thing ask backward compatibility that you (sometimes have to maintain) or the users will complain about broken APIs and ABIs; second, there’s code legacy, i.e. decisions taken in the past that are kept for some reason (e.g. noone understands how it works). Like AVI demuxer in libavformat containing special cases for handling specific files that noone has ever seen.

Monolithic design is yet another problem that creeps into many projects with time. I don’t know why but quite often code gathers itself into intangible chunks and with time those chunks grow and get uglier. Anyone worked with FFmpeg might take a pleasure looking at mpegvideo in libavcodec, libswscale and libpostproc (especially if you look at the versions from about 2010).

So there are two ways how to deal with it — evolution (slowly change interfaces in hope to be better one day, deprecate stuff etc.) and revolution (simply forget it and write a new stuff from scratch).

In this and following posts I’ll describe a new framework (or whatever buzzword applies here) NihAV (Not-Invented-Here Audio-Video). Maybe I’ll even implement it for my own needs and the name should hint how much I care about existing design decisions.

Decompilation Horror

Saturday, April 18th, 2015

In the old days I found PackBits (also DxTory) decoding routine monstrous. That Japanese codec had a single decoding function 349549 bytes long (0x1003DFC00x1009352D) and that was bad style in my opinion.

Well, what do you know? Recently I’ve looked at AMV3 codec. Its encode function is 445048 bytes long (0x10160C200x101CD698). And decode function? 1439210 bytes (0x100011500x1016073A)! I’ve seen many decoders smaller than that function alone.

There’s one thing common to those two codecs that might explain this — both DxTory/PackBits and AMV3 are Japanese codecs. It might be their programming practice (no, it’s not that bad) but remember that other codecs have crappy code too for other reasons. And some of them actually look better in compiled form than in source form (hello there, Ad*be and Micro$oft!). Yet I find it somewhat easier to deal with code that doesn’t frighten IDA (it refuses to show those functions in graph form because of too many nodes and maybe I’ll run decompiler on decode function in autumn – because it will keep my apartment warm till spring).

Vector Quantisation Codecs are Still not (semi-kinda) Dead!

Thursday, April 16th, 2015

While golden days of vector quantisation codecs seem to be over (Cinepak, Smacker and such) there’s still one quite widespread use of vector quantisation in video — texture compression. And, surprisingly, there’s a couple of codecs that employ texture compression methods (good for GPU acceleration, less stuff to invent etc.) like Vidvox Hap or Resolume DXV (which looks suspiciously similar in many aspects but with some features like LZ4, LZF or YCoCg10 compression added). I have not looked that closely at either of them but looks like they still operate on small blocks, it’s just e.g. compressing each plane 8×8 block with BC4 and combining them later.

This does not seem that much interesting to me but I’m sure Vittorio will dig deeper. Good luck to him!

P.S. I forgot — which version of Firefox comes with ORBX.js support?

Some Notes on Lossless Video Codecs

Saturday, March 21st, 2015

While reading a nice PhD thesis from 2014 (in Russian) about new lossless video compression method I laughed hard at lossless video codec descriptions (here’s the link for unbelievers – http://gorkoff.ru/wp-content/uploads/articles/dissertation.pdf, translation below is mine):

To date various lossless videostream compression methods and algorithms have been developed. They are used in the following widespread codecs:

* CorePNG employs deflate algorithm for independent compression of every frame. Theoretically the codec supports delta frames but this option is not used.

* FFV1 employs prediction coding with following entropy coding or prediction error.

* Huffyuv, like FFV1 algorithm, employs predictive coding but prediction error is effectively coded with Huffman algorithm.

* MSU Lossless Video Codec has been developed for many years at Moscow State University labs.

And yet some real world tasks demand more effective compression and thus a challenge of developing new more effective lossless video compression methods still remains actual.

Readers are welcome to find inaccurate, erroneous and outright bullshit statements in this quote themselves. I’d rather talk about lossless video codecs as I know them.
(more…)

Some notes on VP4

Sunday, March 1st, 2015

Well, this information should’ve been posted by someone else but those people seem to be lazier than me. In return I’m not going to use XViD or FLIC for encoding my content.

So, REing VP4 is rather easy – you just download original VP3.2 decoder source (still available at Xiph SVN servers) and compare it to the structure in vp4vfw.dll. There are differences in structures and a bit in code layout but mostly it’s the same code with new additions.

So, VP4 is based on VP3 (surprise!) and introduces a new bitstream version (which is 3 for some reason). Here’s an incomplete list of differences I’ve spotted:

  • Base frame header has some additional fields (I didn’t care enough to decipher their meaning though);
  • Superblock coding uses a bit different scheme with new universal codes resembling exp-Golomb but with VP4 quirk;
  • Frame data decoding differs for frame types;
  • Motion vector component extraction uses Huffman tables and sign from the previous block.

And yet it uses the same coding principles and even token coding seems to be left untouched. It was suspected for a long time that even-numbered On2 codecs were simply an improvements over previous version while odd-numbered On2 codecs were more innovative but not much was known about VP4 to prove it:

  1. Duck TrueMotion 1 — a new codec;
  2. Duck TrueMotion 2 — mostly like TrueMotion 1 but with Huffman encoding;
  3. Duck/On2 TrueMotion VP3 — DCT + static Huffman coding;
  4. On2 TrueMotion VP4 — VP3 with some bitstream coding changes;
  5. On2 TrueCast VP5 — DCT + arithmetic coder;
  6. On2 VP6 — VP5 with some bitstream changes;
  7. On2 VP7 — H.264 ripoff with their own arithmetic coder;
  8. On2 VP8 — VP7 with some small changes;
  9. Baidu VP9 — H.265 ripoff with their own arithmetic coder;
  10. rumoured Baidu VP10 — since there’s no H.266 in the works for now…

It’s all kinda Intel CPUs but without confusing codenames (and Xiph hasn’t produced too many codecs to confuse whether Daalawell came before Theorabridge or after).

P.S. Many thanks to big G for releasing no information on that codec or any other codecs from On2. Oh, and is VP9 “specification” still under NDA?

P.P.S. I should really work on a game codec named after chemical warfare instead.

A Call for Modern Audio Codec

Wednesday, February 11th, 2015

We need a proper audio codec to accompany state of the art video codecs, so here’s an outline of codec features that should be present:

  • audio codec should make more of its context, it should have a system of forward and backward reference frames like B-pyramid in H.264 or H.265;
  • it should employ tonal compensation with that — track the frequency changes from the references (e.g. it may be the same note continued or changing pitch);
  • time domain prediction via FIR or IIR filters;
  • flexible subdivision into subframes like binary tree;
  • raw (or non-transformed at least) coding mode for transients or noise;
  • integer only bitexact transform that passes for MDCT under bad light;
  • high-bitdepth sound support (up to 64 bits per sample).

The project name is transGhost (hopefully no Monty will be hurt by this).

And if you point out this is stupid — well, audio codecs should have the same rights as video codecs including PTS/DTS differences and employing similar coding methods.

Why one should not be overexcited about new formats

Saturday, January 10th, 2015

Today I’ll talk about Opus and BPG and argue why they are not the silver bullets everyone was expecting.

Opus

I cannot say this is a bad codec, it has modern design (hybrid speech+music coder) and impressive performance. What’s wrong about it? Usage.

The codec is ideal for streaming, broadcasting and such. It does not have special multichannel audio, you can combine mono and stereo Opus streams in whatever way you like and you don’t have to care about passing special configuration for it in a special way.

What’s bad about that? When you try to apply it to stored media all those advantages turn into drawbacks. There was no standard way to store it (IIRC Opus-in-TS and Opus-in-MP4 specifications were developed by people that had little in common with Opus developers although some of the latter were present too). There is still one big problem with an ugly hack as “solution” — the lack of keyframes in Opus and the “solution” in form of preroll (i.e. “decode certain number of audio frames before the needed one and discard them”). And not all containers support that feature.

That reminds me of MoosePack SV1-SV7. That was a project intended to improve MPEG Audio Layer II compression and make it a new codec (yes, there’s Layer III but that was one of the reasons MoosePack, Vorbis and other audio codecs were born). It had enjoyed some limited popularity (I’ve implemented MPC decoding support for a reason) but it had two major drawbacks:

  • very brief file format — IIRC it’s just a header and audio blocks prefixed by 20-bit size and no padding to byte either (if you’ve ever worked with raw FLAC streams you should have no problems imagining how good MPC format was);
  • no intra frames — again, IIRC their solution was to simply decode and discard 12 frames before the given one in hope the sound will converge.

MusePack SV8 tried to address all those issues by making new chunked format that could be easily embedded into other containers, its audio blocks could be decoded independently because first frame in it was a keyframe. But it was too late and I don’t know who uses this format at all.

Opus is more advanced and performs better by offloading those problems to container but I still don’t think Opus is an ideal codec for all cases. If you play it continuously it’s fine, when you try to seek problems start to occur.

BPG

This is quite recent example of the idea “let’s stick intraframe coding from some video codec into image format”.

Of course such approach saves time especially if you piggyback state of the art codec but it’s not the optimal solution. Why? Because still image coding and video sequence coding have different goals and working conditions.

In video coding you have a large amount of data that you have to (de)compress efficiently but mostly under specific constraints like framerate. While coding an individual frame is important it’s much more convenient to spend efforts on evening load for decoding all frames. After all, hardly anyone would like to have first frame to be decoded in 0.8s and other 24 frames in 0.1s. That reminds me of ClearVideo which had the inverse problem – intraframes were coded very simply (just IDCT+static Huffman) and interframes employed something fractal and took much more time.

Another difference is content. For video you usually have common frame sizes (like 1920×1080 or 1280×768) and actually modern video codecs are targeted for handling bigger and bigger resolutions. Images on the other hand come in various sizes, even ridiculous ones like 173×69, and they contain stuff you usually don’t expect to be in video form — pixel art, synthetic images, line art etc. (Yes, some people care about monochrome FMV but it’s a very rare case).

Another problem is efficient coding of palettised and monochrome images, lossy or losslessly. For lossless compression it’s much better to operate on whole lines while video coding standards nowadays are block-based and specialised compression schemes beat generic ones. For instance, the same test page compresses to 80kB PNG, 56kB Group4 TIFF or 35kB JBIG image. JPEG-LS beats PNG too and both are very simple compression standards compared to even H.261.

There’s also alpha plane coding, not so many video codecs support it because of its limited use in video. You have it mostly in intermediate codecs or game ones (hello Indeo 4!). So if selected video codec doesn’t support alpha natively you have to glue it somehow (that’s what BPG does).

Thus, we come to the following points:

  • images are individually coded while video codec has to care about whole sequence;
  • images come in different sizes, video sizes are usually few standard ones;
  • images have different content that’s not always well compressed by video coder and specialised compression scheme is always better and maybe faster;
  • images might need some additional features not required by video.

This should also explain why I have some respect for WebPLL but none for WebP.

I’ve omitted obvious problems with adoption, small-power hardware and such because hardly anything beats (M)JPEG there. So next time you choose format for images choose wisely.

A Codec Family Proposal

Monday, September 29th, 2014

There are enough general use standardised codecs, there’s even VPx family for those who want more. But there are not enough niche codecs with free/open specifications.

One of such niche codecs would be an intermediate codec. It’s suitable for capturing and quick editing of video material. Main requirements are modest compression rate and fast processing (scalable is a plus too). Maybe SMPTE VC-5 will be the answer, maybe Ogg Chloe, maybe something completely different. Let’s discuss it some other time.

Another niche codec that desperately needs an open standard is screen video codec. Such codec may be also used for recording webcasts, presentations and such. And here I’d like to discuss a whole family of such codecs based on the same coding principles.

It makes sense to make codec fast by employing multithreading where possible. That’s why frame should be divided into tiles that should be not so large and not so small, maybe 192×128 pixels or so.

Each tile should be coded independently, preferably its distinct features coded separately too. It makes sense to separate tile data into smooth features (like gradients and real life pictures) and sharp transitions (like text and UI elements). Let’s call the former a natural layer and the latter a synthetic layer. We’ll need a mask to tell which layer to use for the current pixel too. And using these main blocks and employing different coding methods we can make a whole family of codecs.

Here’s the list of example codecs (with a random FOURCC assigned):

  • J-B0 — employ JPEG for natural layer and GIFPNG for mask and synthetic layer coding;
  • J-B1 — employ Snow for natural layer coding and FFV1 for synthetic layer coding;
  • J-B2 — employ JPEG-2000 for natural layer coding, JBIG for mask coding and something like PPM modeller for synthetic layer;
  • J-BG — employ WebP for natural layer and WebP LL for synthetic layer.

As one can see, it’s rather easy to build such codec since all coding blocks are there and only natural/synthetic layer separation might need a bit of research. I see no reasons why, say, VLC can’t use it for recording and streaming desktop for e.g. virtual meeting.

On Railways Electrification

Sunday, September 21st, 2014

So what I’ve discovered today.

There’s a Schwarzwaldbahn going through Schwarzwald from Offenburg to Konstanz and there’s a station there — Villingen. That station bears a plaque that it had 10000th kilometre of electrification of DB network done in 1975 (DDR railways on the other hoof lost most of its electrification after the war because it was more important to electrify Soviet railways but that’s another story).

And there’s a branch connecting Villingen (Baden) with Rottweil (Württemberg) — unelectrified. And that branch has its own subbranch to Trossingen Stadt. That subbranch is also served by a diesel railbus. But unlike the branch it connects to it’s electrified! And that electrification is used only by museum vehicles from 1930s-1960s that are electric only (or in one case it’s a carriage with an electric locomotive).

On most such lines in Germany one usually has trains hauled by a steam locomotive or a diesel rail buses and the main traffic is electrified but in this case it’s the other way round. I have only one possible explanation — Württemberg.

P.S. Still it’s hard to find stupider situation with electrification than in Denmark. The only countries it has connections to had chosen 15 kV 16? Hz system. Denmark settled on 25 kV 50 Hz. But looking at their other railway-related decision (i.e. IC4) it seems logical.

P.P.S. For Ukraine the situation is sadder — once I was in Uzhgorod-Kharkiv train and it had to change locomotive twice because there are two electrification systems there (which make three areas). They claim it was done to better account for relief, i.e. different electrification for the flatter and mountainy regions. Hopefully there will be more two-system trains in the future (and there will be the future too).

On Quack VPx

Tuesday, September 16th, 2014

I think most of you have read this piece of news about G**gle VPx plans already. After some thoughts I’ve decided to comment on it as well.

So, here’s a bit of history:

  1. Duck TrueMotion — an original codec;
  2. Duck TrueMotion 2 — a development of TrueMotion 1 (same coding principles but now Huffman coding is employed);
  3. On2 TrueMotion VP3 — something like TrueMotion 2 and MPEG-2(aka H.262) mixed together;
  4. On2 TrueMotion VP4 — most likely some improvements over VP3 (shame on Mike and/or Peter for not REing it yet!);
  5. On2 TrueMotion (or was it TrueCast?) VP5 — MPEG-4 ASP/H.263 ripoff with On2-specific stuff (no B-frames, different coder etc.);
  6. On2 TrueMotion VP6 — minor improvements over VP5;
  7. On2 TrueMotion VP7 — H.264 ripoff with On2-specific stuff (no B-frames, different coder etc.);
  8. On2 TrueMotion VP8 — minor improvements over VP7;
  9. G**gle VP9 — H.265 ripoff with some On2-specific stuff (almost the same as in VP7/VP8);
  10. G**gle VP10 — is not released yet but I can predict it will be just VP9 with some minor improvements and no real specification available (you have Chromium source, just look at the stable branch there).

It is easy to see that there’s a huge issue to deal with if they want to release a new VPx every 18 months — they should have a corresponding ITU H.26x standard (or at least some draft of it) available. The only alternatives are polishing VP9 and calling it a new version when some incompatible feature is added or start ripping off Daala, Dirac and Bink 3. Good luck.