A Call for Modern Audio Codec

February 11th, 2015

We need a proper audio codec to accompany state of the art video codecs, so here’s an outline of codec features that should be present:

  • audio codec should make more of its context, it should have a system of forward and backward reference frames like B-pyramid in H.264 or H.265;
  • it should employ tonal compensation with that — track the frequency changes from the references (e.g. it may be the same note continued or changing pitch);
  • time domain prediction via FIR or IIR filters;
  • flexible subdivision into subframes like binary tree;
  • raw (or non-transformed at least) coding mode for transients or noise;
  • integer only bitexact transform that passes for MDCT under bad light;
  • high-bitdepth sound support (up to 64 bits per sample).

The project name is transGhost (hopefully no Monty will be hurt by this).

And if you point out this is stupid — well, audio codecs should have the same rights as video codecs including PTS/DTS differences and employing similar coding methods.

Why one should not be overexcited about new formats

January 10th, 2015

Today I’ll talk about Opus and BPG and argue why they are not the silver bullets everyone was expecting.

Opus

I cannot say this is a bad codec, it has modern design (hybrid speech+music coder) and impressive performance. What’s wrong about it? Usage.

The codec is ideal for streaming, broadcasting and such. It does not have special multichannel audio, you can combine mono and stereo Opus streams in whatever way you like and you don’t have to care about passing special configuration for it in a special way.

What’s bad about that? When you try to apply it to stored media all those advantages turn into drawbacks. There was no standard way to store it (IIRC Opus-in-TS and Opus-in-MP4 specifications were developed by people that had little in common with Opus developers although some of the latter were present too). There is still one big problem with an ugly hack as “solution” — the lack of keyframes in Opus and the “solution” in form of preroll (i.e. “decode certain number of audio frames before the needed one and discard them”). And not all containers support that feature.

That reminds me of MoosePack SV1-SV7. That was a project intended to improve MPEG Audio Layer II compression and make it a new codec (yes, there’s Layer III but that was one of the reasons MoosePack, Vorbis and other audio codecs were born). It had enjoyed some limited popularity (I’ve implemented MPC decoding support for a reason) but it had two major drawbacks:

  • very brief file format — IIRC it’s just a header and audio blocks prefixed by 20-bit size and no padding to byte either (if you’ve ever worked with raw FLAC streams you should have no problems imagining how good MPC format was);
  • no intra frames — again, IIRC their solution was to simply decode and discard 12 frames before the given one in hope the sound will converge.

MusePack SV8 tried to address all those issues by making new chunked format that could be easily embedded into other containers, its audio blocks could be decoded independently because first frame in it was a keyframe. But it was too late and I don’t know who uses this format at all.

Opus is more advanced and performs better by offloading those problems to container but I still don’t think Opus is an ideal codec for all cases. If you play it continuously it’s fine, when you try to seek problems start to occur.

BPG

This is quite recent example of the idea “let’s stick intraframe coding from some video codec into image format”.

Of course such approach saves time especially if you piggyback state of the art codec but it’s not the optimal solution. Why? Because still image coding and video sequence coding have different goals and working conditions.

In video coding you have a large amount of data that you have to (de)compress efficiently but mostly under specific constraints like framerate. While coding an individual frame is important it’s much more convenient to spend efforts on evening load for decoding all frames. After all, hardly anyone would like to have first frame to be decoded in 0.8s and other 24 frames in 0.1s. That reminds me of ClearVideo which had the inverse problem – intraframes were coded very simply (just IDCT+static Huffman) and interframes employed something fractal and took much more time.

Another difference is content. For video you usually have common frame sizes (like 1920×1080 or 1280×768) and actually modern video codecs are targeted for handling bigger and bigger resolutions. Images on the other hand come in various sizes, even ridiculous ones like 173×69, and they contain stuff you usually don’t expect to be in video form — pixel art, synthetic images, line art etc. (Yes, some people care about monochrome FMV but it’s a very rare case).

Another problem is efficient coding of palettised and monochrome images, lossy or losslessly. For lossless compression it’s much better to operate on whole lines while video coding standards nowadays are block-based and specialised compression schemes beat generic ones. For instance, the same test page compresses to 80kB PNG, 56kB Group4 TIFF or 35kB JBIG image. JPEG-LS beats PNG too and both are very simple compression standards compared to even H.261.

There’s also alpha plane coding, not so many video codecs support it because of its limited use in video. You have it mostly in intermediate codecs or game ones (hello Indeo 4!). So if selected video codec doesn’t support alpha natively you have to glue it somehow (that’s what BPG does).

Thus, we come to the following points:

  • images are individually coded while video codec has to care about whole sequence;
  • images come in different sizes, video sizes are usually few standard ones;
  • images have different content that’s not always well compressed by video coder and specialised compression scheme is always better and maybe faster;
  • images might need some additional features not required by video.

This should also explain why I have some respect for WebPLL but none for WebP.

I’ve omitted obvious problems with adoption, small-power hardware and such because hardly anything beats (M)JPEG there. So next time you choose format for images choose wisely.

H.265: An Alternative History

December 6th, 2014

As you might remember, alternative history genre is modelling events based on real history but with something gone differently. Here’s what could’ve happened with H.265 but didn’t.

So, finally there’s a new standard released — ITU H.265. It promises twice as low bitrate for the same picture quality in H.264. Yet people do not care much about it since industry leaders offer their solutions:

China introduces their new standard for video coding — Hybrid Enhanced Video Standard or HEVS for short. It features quadtree representation of coding blocks, more than thirty spatial prediction modes, block transforms from 4×4 to 32×32 and has one unique feature — motion vectors that implicitly take mirror references from reference picture lists. This standard is nominated for main video coding standard on CUVRD (China ultraviolet ray disc) but gains little popularity outside China.

On2 makes a new codec named VP9 that has no open specifications. After tedious reverse engineering it turns out to employ coding scheme from VP5 times, spatial and motion prediction from H.265 with slightly altered coefficients and overall coding scheme from H.265 drafts.

Re… buffering… alNetworks releases NGV at last (fourcc RV50). Again, after long studies of binary specification, it turns out to be based on H.265 drafts with some in-house improvements: using context-specific codebooks for elements coding, ?-pel motion compensation (which is implemented as motion vectors pretended to be ?-pel but in reality several different positions are handled by the same function). The codec goes very popular in China for some reason.

Sorenson releases SVQ7. It is based on old H.265 draft and employs ?- and ?-pel motion compensation. It has some additional features like watermarking and quickly becomes the codec of choice for QuickTime.

P.S. Good thing nothing like this has really happened.

Blåtand-Passande-X

November 23rd, 2014

So, finally there’s a post about some codec.

It is a specialised codec from Oxford Germanium Television (all names are changed just in case) that has 4:1 compression ratio and very niche use. It’s hard to find even a decoder for it so this analysis was done on ARM version of encoder (maybe I’ll be able to RE something more useful next time like VX).

The codec itself is rather simple: you take 4 samples from one channel, compress them, output the 16-bit result and repeat the same for the second channel. Encoding is rather simple too:

  1. feed input to 4-band QMF (with filter looking a lot like D4 wavelet to me);
  2. perform ADPCM on each band (this varies a bit for each band but it’s the same approach);
  3. generate output word (7 bits for band 0, 4 bits for band 1, 2 bits for band 2 and 3 plus a parity bit for them all).

Since I have no samples of it don’t expect a decoder from me any time soon (and I don’t have enough motivation to hook Android encoder directly to make it produce data). Not that anyone cares about it either.

A Bit on Germany

November 21st, 2014

quote

An excerpt from a book that I have to refer sometimes (here’s the source, it really tells a lot about proper relationship. Ask a nearby German for a translation if you need it.

P.S. Next post will be about a codec technical description, I promise.

A Bit on Italy

November 21st, 2014

Italy is a good country (piecewise) — nice scenery, decent food (and pasta), nice architecture…
There’s but one thing that annoys me, I think you’ll be able to spot it on those two pictures.

Milan, a corner of via Vitruvio and via Benedetto Marcello.
Milan

Turin, largo Cassini.
Torino

I cannot say it for sure but I remember Ukrainian markets (right after closing time) being cleaner than that.

A Codec Family Proposal

September 29th, 2014

There are enough general use standardised codecs, there’s even VPx family for those who want more. But there are not enough niche codecs with free/open specifications.

One of such niche codecs would be an intermediate codec. It’s suitable for capturing and quick editing of video material. Main requirements are modest compression rate and fast processing (scalable is a plus too). Maybe SMPTE VC-5 will be the answer, maybe Ogg Chloe, maybe something completely different. Let’s discuss it some other time.

Another niche codec that desperately needs an open standard is screen video codec. Such codec may be also used for recording webcasts, presentations and such. And here I’d like to discuss a whole family of such codecs based on the same coding principles.

It makes sense to make codec fast by employing multithreading where possible. That’s why frame should be divided into tiles that should be not so large and not so small, maybe 192×128 pixels or so.

Each tile should be coded independently, preferably its distinct features coded separately too. It makes sense to separate tile data into smooth features (like gradients and real life pictures) and sharp transitions (like text and UI elements). Let’s call the former a natural layer and the latter a synthetic layer. We’ll need a mask to tell which layer to use for the current pixel too. And using these main blocks and employing different coding methods we can make a whole family of codecs.

Here’s the list of example codecs (with a random FOURCC assigned):

  • J-B0 — employ JPEG for natural layer and GIFPNG for mask and synthetic layer coding;
  • J-B1 — employ Snow for natural layer coding and FFV1 for synthetic layer coding;
  • J-B2 — employ JPEG-2000 for natural layer coding, JBIG for mask coding and something like PPM modeller for synthetic layer;
  • J-BG — employ WebP for natural layer and WebP LL for synthetic layer.

As one can see, it’s rather easy to build such codec since all coding blocks are there and only natural/synthetic layer separation might need a bit of research. I see no reasons why, say, VLC can’t use it for recording and streaming desktop for e.g. virtual meeting.

On Railways Electrification

September 21st, 2014

So what I’ve discovered today.

There’s a Schwarzwaldbahn going through Schwarzwald from Offenburg to Konstanz and there’s a station there — Villingen. That station bears a plaque that it had 10000th kilometre of electrification of DB network done in 1975 (DDR railways on the other hoof lost most of its electrification after the war because it was more important to electrify Soviet railways but that’s another story).

And there’s a branch connecting Villingen (Baden) with Rottweil (Württemberg) — unelectrified. And that branch has its own subbranch to Trossingen Stadt. That subbranch is also served by a diesel railbus. But unlike the branch it connects to it’s electrified! And that electrification is used only by museum vehicles from 1930s-1960s that are electric only (or in one case it’s a carriage with an electric locomotive).

On most such lines in Germany one usually has trains hauled by a steam locomotive or a diesel rail buses and the main traffic is electrified but in this case it’s the other way round. I have only one possible explanation — Württemberg.

P.S. Still it’s hard to find stupider situation with electrification than in Denmark. The only countries it has connections to had chosen 15 kV 16? Hz system. Denmark settled on 25 kV 50 Hz. But looking at their other railway-related decision (i.e. IC4) it seems logical.

P.P.S. For Ukraine the situation is sadder — once I was in Uzhgorod-Kharkiv train and it had to change locomotive twice because there are two electrification systems there (which make three areas). They claim it was done to better account for relief, i.e. different electrification for the flatter and mountainy regions. Hopefully there will be more two-system trains in the future (and there will be the future too).

On Anime

September 20th, 2014

I’ve finally bought a DVD with an anime I wanted to watch. And it was awesome. Here are some screenshots.

xine_snapshot-1

xine_snapshot-2

xine_snapshot-3

xine_snapshot-4

xine_snapshot-5

xine_snapshot-6

xine_snapshot-7

xine_snapshot-8

P.S. I should probably visit Småland next summer.

On Quack VPx

September 16th, 2014

I think most of you have read this piece of news about G**gle VPx plans already. After some thoughts I’ve decided to comment on it as well.

So, here’s a bit of history:

  1. Duck TrueMotion — an original codec;
  2. Duck TrueMotion 2 — a development of TrueMotion 1 (same coding principles but now Huffman coding is employed);
  3. On2 TrueMotion VP3 — something like TrueMotion 2 and MPEG-2(aka H.262) mixed together;
  4. On2 TrueMotion VP4 — most likely some improvements over VP3 (shame on Mike and/or Peter for not REing it yet!);
  5. On2 TrueMotion (or was it TrueCast?) VP5 — MPEG-4 ASP/H.263 ripoff with On2-specific stuff (no B-frames, different coder etc.);
  6. On2 TrueMotion VP6 — minor improvements over VP5;
  7. On2 TrueMotion VP7 — H.264 ripoff with On2-specific stuff (no B-frames, different coder etc.);
  8. On2 TrueMotion VP8 — minor improvements over VP7;
  9. G**gle VP9 — H.265 ripoff with some On2-specific stuff (almost the same as in VP7/VP8);
  10. G**gle VP10 — is not released yet but I can predict it will be just VP9 with some minor improvements and no real specification available (you have Chromium source, just look at the stable branch there).

It is easy to see that there’s a huge issue to deal with if they want to release a new VPx every 18 months — they should have a corresponding ITU H.26x standard (or at least some draft of it) available. The only alternatives are polishing VP9 and calling it a new version when some incompatible feature is added or start ripping off Daala, Dirac and Bink 3. Good luck.