Today I’ll talk about Opus and BPG and argue why they are not the silver bullets everyone was expecting.
I cannot say this is a bad codec, it has modern design (hybrid speech+music coder) and impressive performance. What’s wrong about it? Usage.
The codec is ideal for streaming, broadcasting and such. It does not have special multichannel audio, you can combine mono and stereo Opus streams in whatever way you like and you don’t have to care about passing special configuration for it in a special way.
What’s bad about that? When you try to apply it to stored media all those advantages turn into drawbacks. There was no standard way to store it (IIRC Opus-in-TS and Opus-in-MP4 specifications were developed by people that had little in common with Opus developers although some of the latter were present too). There is still one big problem with an ugly hack as “solution” — the lack of keyframes in Opus and the “solution” in form of preroll (i.e. “decode certain number of audio frames before the needed one and discard them”). And not all containers support that feature.
That reminds me of MoosePack SV1-SV7. That was a project intended to improve MPEG Audio Layer II compression and make it a new codec (yes, there’s Layer III but that was one of the reasons MoosePack, Vorbis and other audio codecs were born). It had enjoyed some limited popularity (I’ve implemented MPC decoding support for a reason) but it had two major drawbacks:
- very brief file format — IIRC it’s just a header and audio blocks prefixed by 20-bit size and no padding to byte either (if you’ve ever worked with raw FLAC streams you should have no problems imagining how good MPC format was);
- no intra frames — again, IIRC their solution was to simply decode and discard 12 frames before the given one in hope the sound will converge.
MusePack SV8 tried to address all those issues by making new chunked format that could be easily embedded into other containers, its audio blocks could be decoded independently because first frame in it was a keyframe. But it was too late and I don’t know who uses this format at all.
Opus is more advanced and performs better by offloading those problems to container but I still don’t think Opus is an ideal codec for all cases. If you play it continuously it’s fine, when you try to seek problems start to occur.
This is quite recent example of the idea “let’s stick intraframe coding from some video codec into image format”.
Of course such approach saves time especially if you piggyback state of the art codec but it’s not the optimal solution. Why? Because still image coding and video sequence coding have different goals and working conditions.
In video coding you have a large amount of data that you have to (de)compress efficiently but mostly under specific constraints like framerate. While coding an individual frame is important it’s much more convenient to spend efforts on evening load for decoding all frames. After all, hardly anyone would like to have first frame to be decoded in 0.8s and other 24 frames in 0.1s. That reminds me of ClearVideo which had the inverse problem – intraframes were coded very simply (just IDCT+static Huffman) and interframes employed something fractal and took much more time.
Another difference is content. For video you usually have common frame sizes (like 1920×1080 or 1280×768) and actually modern video codecs are targeted for handling bigger and bigger resolutions. Images on the other hand come in various sizes, even ridiculous ones like 173×69, and they contain stuff you usually don’t expect to be in video form — pixel art, synthetic images, line art etc. (Yes, some people care about monochrome FMV but it’s a very rare case).
Another problem is efficient coding of palettised and monochrome images, lossy or losslessly. For lossless compression it’s much better to operate on whole lines while video coding standards nowadays are block-based and specialised compression schemes beat generic ones. For instance, the same test page compresses to 80kB PNG, 56kB Group4 TIFF or 35kB JBIG image. JPEG-LS beats PNG too and both are very simple compression standards compared to even H.261.
There’s also alpha plane coding, not so many video codecs support it because of its limited use in video. You have it mostly in intermediate codecs or game ones (hello Indeo 4!). So if selected video codec doesn’t support alpha natively you have to glue it somehow (that’s what BPG does).
Thus, we come to the following points:
- images are individually coded while video codec has to care about whole sequence;
- images come in different sizes, video sizes are usually few standard ones;
- images have different content that’s not always well compressed by video coder and specialised compression scheme is always better and maybe faster;
- images might need some additional features not required by video.
This should also explain why I have some respect for WebPLL but none for WebP.
I’ve omitted obvious problems with adoption, small-power hardware and such because hardly anything beats (M)JPEG there. So next time you choose format for images choose wisely.