Why one should not be overexcited about new formats

Today I’ll talk about Opus and BPG and argue why they are not the silver bullets everyone was expecting.

Opus

I cannot say this is a bad codec, it has modern design (hybrid speech+music coder) and impressive performance. What’s wrong about it? Usage.

The codec is ideal for streaming, broadcasting and such. It does not have special multichannel audio, you can combine mono and stereo Opus streams in whatever way you like and you don’t have to care about passing special configuration for it in a special way.

What’s bad about that? When you try to apply it to stored media all those advantages turn into drawbacks. There was no standard way to store it (IIRC Opus-in-TS and Opus-in-MP4 specifications were developed by people that had little in common with Opus developers although some of the latter were present too). There is still one big problem with an ugly hack as “solution” — the lack of keyframes in Opus and the “solution” in form of preroll (i.e. “decode certain number of audio frames before the needed one and discard them”). And not all containers support that feature.

That reminds me of MoosePack SV1-SV7. That was a project intended to improve MPEG Audio Layer II compression and make it a new codec (yes, there’s Layer III but that was one of the reasons MoosePack, Vorbis and other audio codecs were born). It had enjoyed some limited popularity (I’ve implemented MPC decoding support for a reason) but it had two major drawbacks:

  • very brief file format — IIRC it’s just a header and audio blocks prefixed by 20-bit size and no padding to byte either (if you’ve ever worked with raw FLAC streams you should have no problems imagining how good MPC format was);
  • no intra frames — again, IIRC their solution was to simply decode and discard 12 frames before the given one in hope the sound will converge.

MusePack SV8 tried to address all those issues by making new chunked format that could be easily embedded into other containers, its audio blocks could be decoded independently because first frame in it was a keyframe. But it was too late and I don’t know who uses this format at all.

Opus is more advanced and performs better by offloading those problems to container but I still don’t think Opus is an ideal codec for all cases. If you play it continuously it’s fine, when you try to seek problems start to occur.

BPG

This is quite recent example of the idea “let’s stick intraframe coding from some video codec into image format”.

Of course such approach saves time especially if you piggyback state of the art codec but it’s not the optimal solution. Why? Because still image coding and video sequence coding have different goals and working conditions.

In video coding you have a large amount of data that you have to (de)compress efficiently but mostly under specific constraints like framerate. While coding an individual frame is important it’s much more convenient to spend efforts on evening load for decoding all frames. After all, hardly anyone would like to have first frame to be decoded in 0.8s and other 24 frames in 0.1s. That reminds me of ClearVideo which had the inverse problem – intraframes were coded very simply (just IDCT+static Huffman) and interframes employed something fractal and took much more time.

Another difference is content. For video you usually have common frame sizes (like 1920×1080 or 1280×768) and actually modern video codecs are targeted for handling bigger and bigger resolutions. Images on the other hand come in various sizes, even ridiculous ones like 173×69, and they contain stuff you usually don’t expect to be in video form — pixel art, synthetic images, line art etc. (Yes, some people care about monochrome FMV but it’s a very rare case).

Another problem is efficient coding of palettised and monochrome images, lossy or losslessly. For lossless compression it’s much better to operate on whole lines while video coding standards nowadays are block-based and specialised compression schemes beat generic ones. For instance, the same test page compresses to 80kB PNG, 56kB Group4 TIFF or 35kB JBIG image. JPEG-LS beats PNG too and both are very simple compression standards compared to even H.261.

There’s also alpha plane coding, not so many video codecs support it because of its limited use in video. You have it mostly in intermediate codecs or game ones (hello Indeo 4!). So if selected video codec doesn’t support alpha natively you have to glue it somehow (that’s what BPG does).

Thus, we come to the following points:

  • images are individually coded while video codec has to care about whole sequence;
  • images come in different sizes, video sizes are usually few standard ones;
  • images have different content that’s not always well compressed by video coder and specialised compression scheme is always better and maybe faster;
  • images might need some additional features not required by video.

This should also explain why I have some respect for WebPLL but none for WebP.

I’ve omitted obvious problems with adoption, small-power hardware and such because hardly anything beats (M)JPEG there. So next time you choose format for images choose wisely.

12 Responses to “Why one should not be overexcited about new formats”

  1. derf says:

    There was no standard way to store it

    https://tools.ietf.org/html/draft-ietf-codec-oggopus (this will be heading to IETF Last Call and RFC as soon as I can get my co-chair to finish his write-up on it).

    See also Opus in MKV/WebM, which has been deployed for some time now.

    IIRC Opus-in-TS and Opus-in-MP4 specifications were developed by people that had little in common with Opus developers

    I wrote the first draft of the Opus-in-TS specification myself (relying on advice from people who know much more about TS than I do, to be sure). Opus-in-MP4 is a similar collaborative effort that has been much discussed among #opus developers. Yes, the primary author had not worked on Opus before, but we’d have been foolish to turn away his help.

    …when you try to seek problems start to occur.

    Do you have examples of actual problems?

  2. Kostya says:

    There was no standard way to store it…

    That’s why I said “there was”, not “there is”. But I think that still happened after Opus 1.0, not along with it – weren’t you drafting Opus-in-TS at VideoLAN Developer Days 13 with the help of more broadcasting-oriented people? And Opus-in-MP4 draft mostly mentions people that I know don’t have anything to do with Opus development.

    …when you try to seek problems start to occur.

    I consider pre-roll mechanism to be a problem. From demuxer point of view it’s better when you seek to certain position and can start serving frames immediately, not “if you want to play video seek there but if you want to play audio seek to an earlier position to compensate pre-roll but don’t start serving video frames yet”. It’s unnecessary complication IMO.

    Remember, I look at formats mostly from independent reimplementation point of view. It kills a lot of fun but gives some rant material.

  3. kurosu says:

    For palettized content, there’s an HEVC extension called “screen content coding” which has a palette mode and various other tools. But still block-based.

    I tested lossless coding of black and white image (actually, TIFF CCIT group 4 decompressed scans of documents). The resulting files were often 4 times as small as the original TIFF files. But CCIT group 4 doesn’t use CABAC and is around 30yo, so there’s that.

    As for BPG, my biggest concern is: “what a mess”. For now, I’d say it’s completely experimental, and most people should wait for the format to be somewhat frozen before bothering.

  4. Kostya says:

    @kurosu CCITT T.6 may be old yet it works. For something more effective with better coders and such there’s JBIG and JBIG2, the latter is specially oriented for coding text pages (it can group common glyphs into font and use it instead of coding the same shape again, pity it’s not popular).

    JPEG-LS also has palettised mode in which decoded RGB image is quantised back using stored palette (for lossy mode of operation, for lossless it’s simply a reverse lookup). I can imagine HEVC having something similar if required.

    As for HEVC SCC(v2) – I’m looking at it and it seems to be yet another rather useless hack so far.

  5. kurosu says:

    Regarding CCITT, it’s just that experiment was kind of meaningless (only testing using what Gimp offered) due to its age.

    I didn’t know JBIG2 could do that glyph matching, though that’s probably restricted to monochrome. At least, there’s some deployment in hardware.

    Anyway, for SCC, it was mostly aimed at “Another problem is efficient coding of palettised and monochrome images, lossy or losslessly.” in regard to BPG. It seems only natural that BPG may use SCC tools, and however suboptimal that solution is, it still solves the problem above, which you don’t define yourself as useless.

  6. Kostya says:

    JBIG2 is used in PDF and DjVu (non-standard in the latter case) though.

    I’ve looked at SCC proposal, their approach is interesting but somewhat limited (if you have more than 32 colours per CTB you gonna have a lot of escapes). I fully agree it will work but the emphasis is on “however suboptimal” part.

    Old venerable formats (like TIFF) had several compression methods because one scheme does not perform well in all cases.

    And I hope HEVC won’t end like MPEG-4 Audio with its insane variety of coding tools and extensions that would give nightmares to any implementor.

  7. derf says:

    But I think that still happened after Opus 1.0, not along with it…

    The first version of the Ogg spec was published at the IETF on July 3, 2012, before RFC 6716 and the libopus 1.0 release (and that spec was edited and reviewed on Xiph’s wiki before that). It hasn’t changed in any meaningful way since then (the bits on the wire certainly haven’t), and we were already deploying implementations in Firefox, VLC, libopusfile, etc. right around when 1.0 hit.

    Other containers have followed after depending on the interest people have expressed in them. But it wouldn’t have made any sense to block the publication of the main Opus or Opus-in-Ogg specs on those.

    I consider pre-roll mechanism to be a problem.

    It’s no different in principle from any other MDCT-based codecs, which need at least one frame of preroll for overlap-add, except that it might require more than one packet. In practice, when you usually have ~1 second of audio per Ogg page, the actual place you need to start reading from the file is the same either way. I personally implemented seeking “independently” in the three places mentioned above [1], and didn’t have any real difficulty dealing with pre-roll. Making RTP people understand the need for “keyframes” so that you could actually record live calls to a file and be able to seek in them later would have been a much bigger challenge.

    [1] Which I’ll claim because VLC and Firefox already had existing seeking implementations written by others to which I only wanted to make minimal modifications.

  8. Kostya says:

    I don’t know MDCT codecs that require more than exactly one frame for overlap-add and the frames are coded independently (the only exception I know is MusePack but it’s PQMF-based and mentioned in the post). But that confusion with encoder delay various codecs enjoy (especially MP3) is also not nice.

    Making RTP people understand the need for “keyframes” so that you could actually record live calls to a file and be able to seek in them later would have been a much bigger challenge.

    Exactly.

  9. derf says:

    My point is that if we had audio keyframes, we would be in a worse situation. People would write software that could only seek to keyframes, those keyframes wouldn’t line up with video keyframes (except possibly at framerates of 25 or 50 fps, which are the exception, not the norm), so you’d still have to do some buffering in the “ideal” case, and in practice people would make broken, unseekable streams with no keyframes anyway.

    Instead we have a design that’s resilient to such misbehavior on the part of implementers, based on solid knowledge of the decay rates of the (stable!) IIR filters the codec uses and extensive empirical testing for confirmation. We didn’t make this decision on a whim.

  10. Kostya says:

    I’d argue that in this case you can still make keyframes configurable (e.g. every 5 seconds) and it’s easy to produce seekpoint when you tell both video and audio encoder to do so. But yeah, this breaks existing workflows even if on encoding side only. And I fully agree that people would misuse and abuse it anyway.

    Now I just have to wait for someone to introduce an audio codec with B-frames, it should open a new can of worms and make this world even weirder.

  11. Fruit says:

    How about the inability of Opus to natively store 44.1KHz audio due to only allowing certain fixed sampling frequencies?
    Do you also find that objectionable or do you consider that to not be a problem?

  12. Kostya says:

    Well, that does not bother me much. Either you have to use resampling or simply pretend your 44.1KHz signal is 48KHz – both might work.

    Since I have no use for Opus myself and I’m still not convinced it’s a good storage format, limiting sampling frequencies is not an issue. And I guess streaming people didn’t have anything against it either — most of their codecs are locked to one or two sampling rates anyway.