An upcoming image format war?

So this week libwebp2 appeared in a public repository. From a quick glance it looks like lossy format is based on AV1 coding blocks and lossless format is largely the same as the original WebP lossless but both now use ANS coding. And (of course) there’s a hint on experimental lossy encoding using neural networks.

Let’s pretend that JPEG has finally died (again) and GIF and PNG are both gone. So what modern image formats intended for general audience are out there?

Of course there’s Nokia HEIF which is picture(s) split into tiles, coded with H.EVC and stored in MP4. Because of the wonderful patent situation around it probably it won’t be used outside iEcosystem.

AVIF—same container, different codec (AV1 in this case).

WebP/WebP2—Baidu image format with lossy compression based on Baidu VPx codec (VP8 or VP10) and lossless compression from the French division of Baidu.

JPEG XL—a joint project between Cloudinary and Swiss division of Baidu responsible for Baidu Chrömli (in case you did not know that’s a Swiss word for various small sweets bits like Guetsli, Brunsli and such; Brötli/Gipfeli/Zöpfli/Grittibänzli are related to bread though, especially Brötli). Anyway, that’s a different format with different set of features that include lossless JPEG recompression (and hopefully the best practical lossless image compression as one would expect from creators of FLIF).

So my point is if you’d have to choose between all those formats essentially you have to pick some format from Baidu (either directly from it or using its codec). Somehow this future does not excite me much so I’d rather stick to old formats for which a single programmer can write a standalone decoder in reasonable time.

Also for some reason this reminds me of Soviet space program where there were three main construction centres (led by Korolyov, Chelomey and Yangel) producing different missiles and spaceships many of those are still in use. But the competition was also hurtful for the general progress. As you remember there were three heavy spaceships proposed by neither of them was really successful: Korolyov’s N1 had failures because of the engines, Yangel’s R-56 was cancelled early in favour of N1, Chelomey’s UR-700 has never been realized either, Glushko’s Energia had two launches (both successful) but it was too late and there was no payload for it beside equally successful Buran program. So on one hand you have variety and on the other hand you have a lot of wasted resources and efforts.

I see parallels here and with AV1 as well. Why the company controlling libaom would develop libgav1 too?

And while speaking about AV1 I should mention that it reminds me of another kind of project, namely Olympic games.

Originally the Olympics were competition between various people from various city-states for both religious and entertainment reasons. Later they were resurrected as a mean to promote sports and unity, but just a couple decades later the games became more of a political instrument promoting national teams instead of being just a competition of individuals from various places (partly because all the training becomes too costly for a non-professional sportsman, partly because countries want the prestige). And a bit later it became a business project that 2004 Summer games in Athens demonstrated the best.

So you have a committee that holds the rights to the symbols, logos, mascots and everything else. The receiving party has to build large infrastructure to host various competitions and hope that the guests will bring enough money to compensate at least some of those costs (and maybe those buildings would be useful later but quite often they are not). Various companies pay a lot of money to become sponsors in hope that such status will work as effective advertisement, broadcasting companies pay a lot of money for broadcasting rights in hope of getting more viewers (and money from ads). So before the games a lot of parties pay a lot of money and afterwards they might make profit or not. And the host country is left with huge expenses for constructing stadiums and such—and those rather useless constructions that are too big for regular events or training. And of course the prestige. Where money go to and which Olympics were profitable to the host country is left as an exercise to the reader.

In a similar way AV1 feels like such project: it drew resources from different companies and people from different opensource multimedia projects to build something huge that is not really useful (I know that in theory it should trade bandwidth for CPU heat but how many customers will be AV1 ready before AV2 is released and the cycle repeats?) and people involved in libaom, svt-av1, dav1d and rav1e would better be doing something else including better multimedia frameworks (I work on NihAV mostly because the alternatives are even worse) or new codecs or even on a decent video editor so people making videos for BaidUTube won’t have to rely on expensive proprietary solutions that tend to crash anyway or suspicious Chinese or Russian programs that rip off opensource libraries (I’ve seen one using mencoder compiled as a .dll).

Anyway, like the Olympics were intended to promote sport and healthy living but became business projects that are financial loss to the most parties, AV1 looks like a project that also while being positioned as the saviour of opensource multimedia essentially benefits just a small group of organisations. And as with many other things I say I’d be happy to be proven wrong.

P.S. In case you say that I’m inconsistent and dislike both competing groups inside one company and uniting efforts (for the sake of the same company). Well, I’d prefer different entities (companies or opensource projects or whatever) to produce single solution each while there’s more than just one entity doing it. To return to space analogies, I’d rather see many private companies developing an own line of spaceships each (for various purposes too) instead of ULA producing several kinds of radically different spaceships without any outside competition.

15 Responses to “An upcoming image format war?”

  1. vladowitz says:

    thanks for the nice post. your olympics comparison is gold. besides the competition aspects you highlight, there are interesting parallels on the PR/marketing side of things too. who would dare to oppose his own countries noble efforts to seize the olympic games this or that year while the reality is that the best funded (or most corrupt) campaign likely succeeds. noble motives sell best and engage even those to be super vocal that have the least to do with the actual activity…

  2. Kostya says:

    Well, hopefully opensource multimedia is not that corrupt yet. There was MPEG though that got dissolved because of some internal bureaucratic games.

    Let’s see if the things go better in 2021…

  3. Jonatas says:

    I don’t see a problem with AV2 (or even 3, 4, 5) being developed before AV1 is practically being deployed. Given the nature of media, how it usually is desirable to reach the biggest possible audience, we will always be using yesteryear’s technology. A new codec needs to be deployed in general hardware now for people to start making use of it in a distant future (likely at least a decade). I believe this will be the norm from now on, people will work on new codecs for their children to use. Crucially we keep using technology developed before/at/for the last big tech disruptor, the smartphone. Same thing happened for the Internet. So we can expect the same thing to happen again.

    IDK why I’m writing this unwarranted shit. Just wanted say that I appreciate your writing, both blog and software.

  4. Kostya says:

    An interesting thought indeed. On one hand evolutions works in the same way by using already existing bits, on the other hand this will have large reactivity since a lot of time passes before technology creation and wide enough adoption to show its drawbacks and cases where it can be improved.

    And there’s another sad thought from me – it takes 20 years for a patent to expire so modern technologies might be made for the two decades future indeed.

  5. Paul says:

    Your software violates several patents, please remove it or will get DMCA takedown request.

  6. Kostya says:

    You didn’t say please “cease and desist”.

  7. Ksec says:

    My problem with image format is that none of them are good enough to displace JPEG.
    I think Dropbox made a update to JPEG entropy alone that gets 20% reduction in size instantly. JPEG-XL is optimised for 2-3 bit per pixel, which is large in my opinion. I saw a still image from encoded with VVC Ref encoder at bpp below 1 and it was stunning.

    Both EVC and VVC Spec are out. Will you be doing a post ( rant šŸ˜€ ) on them anytime soon?

    I am eagerly waiting for EVC, we should finally have a decent video codec that is ( or should be ) patents free. ( At least the Baseline Profile )

  8. Kostya says:

    IIRC there’s some commercial EVC player app already. And from what I remember it’s mostly patent-free bits from AVC and HEVC so it’s not that interesting from technical point of view (and it was not its goal to be state of the art video codec either).

    As for VVC I said I’ll write about VVC when AV2 is ready so there’s something to compare and contrast.

    In general I still believe that still images and video codec frames have different set of constraints and requirements so IMO it’s not the best idea to put I-frame into some container and call it a new image format.

  9. Hey Kostya, I shared this page with an interested Discord chat. There was confusion over your too-clever matter of ‘s/Google/Baidu/g’ throughout the piece. That doesn’t really land with everyone outside our clique.

    Good overview otherwise, thought.

  10. Kostya says:

    I thought it’s impossible to confuse them. One is a totalitarian search engine with arbitrary censorship that collects too much information about users and another one is a Chinese company.

    Still I’ll keep muddling the search results because it’s a tradition already šŸ˜‰ Or you can think about it as me being so respectful of a company that does no evil that I’d rather not mention its name in vain.

  11. […] passionate about image codecs. A ā€œcodec battleā€ is brewing, and Iā€™m not the only one to have opinions about that. Obviously, as the chair of the JPEG XL ad hoc group in the JPEG Committee, Iā€™m firmly […]

  12. Anon says:

    >As for VVC I said Iā€™ll write about VVC when AV2 is ready so thereā€™s something to compare and contrast.

    Well, VP8 is a H.264 ripoff, VP9 is a (slightly less) H.265 ripoff, and AV1 is a (slightly less still) H.266 ripoff, but AV1 at least has the excuse of being actually released before H.266.

    >I am eagerly waiting for EVC, we should finally have a decent video codec that is ( or should be ) patents free. ( At least the Baseline Profile )

    aka “what H.264 should have been if it hasn’t been crippled by hardwaretards”. seriously, 16*16 2D DCT is not such a difficult thing; it could have been present in H.264. but muh hardware decoding!111

  13. Kostya says:

    >but AV1 at least has the excuse of being actually released before H.266.

    True, and I heard the rumours they were willing to rush AV2 to be released before H.266 as well but that didn’t happen. Still, it’s fun to observe what ideas get in and where they actually originated from.

    >16*16 2D DCT is not such a difficult thing; it could have been present in H.264. but muh hardware decoding!111

    Well, that depends. From what I heard AV1 hardware decoders are not so popular exactly for their increased complexity. I understand why H.264 started with 4×4 transform everywhere – it is simple and preserves fine details better. But when you add 8×8 transform for high profile later why not add 16×16 transform as well?

  14. Anon says:

    Funnily enough, one idea was in VP8 before it appeared to H.265: intra prediction as (top+left-topleft), I forgot the fancy name for it. I don’t know, maybe it taken from some rejected H.264 proposals?
    Still, I think AV1 is the right codec to compare to VVC, for the reason in the “pre-previous” comment.
    The differences I can remember are:
    1. as with VP9, bipredicted frames are non-displayed, etc.
    2. different entropy coders
    3. different deringing filters
    4. IIRC block partitioning in VVC is a bit more fancy

  15. Kostya says:

    Well, I think T + L - TL is called gradient prediction (but I may be wrong) and it was employed by lossless JPEG which is a specification as old as On2 itself (yes, they both appeared in 1992). It’s just Paeth prediction often gives better results and thus became more popular.

    Let’s see if H.267 happens so it can be compared with and against AV2.