Yesterday that Derek’s talk at Demuxed got to me for about the fourth time and I was asked about my opinion on it as well. I can take the hint (eventually), so here’s what I think.
Unlike Derek I’m a major nobody with some interest on how the codecs are working, to the point that I’m not afraid to look at their binary specification and sometimes even implement a decoder. Anyway, I’ll try to give a short summary of the points he presents and what I think about it.
First he talks about HEVC adoption being features and not bandwidth saving. I can believe that, especially if there are companies like Nvidia with its notoriously bad H.264 decoding support (which on certain models could not decode videos of certain sizes let alone more advanced features) and not going to update it for the new features. And with integer-based transforms you want higher bitdepth for better coefficient representation: I heard from x264
people that the scene used 10-bit H.264 to compress 8-bit content exactly because for quality considerations. Similarly my colleague said more than once that he watches Netflix videos in 4K downscaled for FullHD to hide the compression artefacts.
Then he mentions VPx series of codecs and that it did not enjoy that much of adoption in the wild. I would probably mention VP10 (aka AV1) there already but other than that there’s nothing to add here.
After that he moves to list who (and why) are not eager to adopt new codecs. Again, nothing much to add here.
He concludes the first part of the talk with the potential prospects of AV1 and VVC. I’d rant more about how patent extortionists spoil the whole landscape and how AV1 is lacking on support even from the members of AOM but he had a time limit.
The second part of his talk starts with a tepid take of H.264 becoming a JPEG of video codecs (implying that it’s become old but well-used familiar codecs with some extensions everybody prefers not to think about). Probably this is true.
He goes on with mentioning MPEG-2, MPEG-4 ASP and MPEG-4 AVC codecs and that they were implementable and reasonably documented. I’d say that the only unreasonable thing about the documentation was ISO pricing of the standards (and still is).
Then he laments on the lack of the knowledge preserved from the older generations. I share his emotions but some of his examples are not good. Last time I tried to search for some MPEG document, it turned out that some of them are open to MPEG members only. Other stuff was quite well documented in e.g. textbooks on analogue television but who reads them nowadays. And another stuff is obscure because it’s taken for granted. For example, we know that the Earth is spherical (in the second approximation) and even Ancient Greeks knew that—but do you know how they got to that conclusion? Can you tell without resorting to the web search anything beside shadows at noon in various places? And do you really need that knowledge in order to exploit the Earth shape? Another point is that some decisions are made for non-technical reasons: I heard that MPEG Audio Layer III was made such an ugly beast partly because a certain company had a patent on QMF used in Layer I and II so it forced it into Layer III design as well. But of course such things won’t get into textbooks.
This is followed by a cursory remark on two things: the same stuff repeatedly proposed for newer codecs. I still believe that codec development is hindered by the computing resources available (e.g. the celebrated chroma from luma prediction mode in AV1 started as a rejected proposal for H.265—but now it’s acceptable). Back in the day there was a joke that a perfect encoder is the one producing MD5 hash of a frame and letting the decoder to find a suitable data for it. With AI-based approach this might be no longer a joke (have you ever heard about Stable Diffusion?).
And with this he comes to the logical conclusion that out of the sheer complexity of the field it is stagnating: nobody understands the whole picture, the stuff is not taught well so hardly anybody can get into it (and they have to learn it by trial and error), and some of the stuff gets reinvented again and again. I mostly agree with that but I should point out that some codec authors still write books about their stuff (back at VDD15 I actually showed one of those VP9 guys the chapter with description of AVS2 from such book and he managed to find a lot of similarity between those two codecs). But while codec overviews are rather common, books on writing a good encoders for them are virtually non-existent. It’s exactly the same situation as with the compilers: back in the day you were supposed to know how to write a parser, a lexer and perform some optimisations—nowadays it’s “take LLVM, implement new syntax and let its huge swarm of optimisation rules that nobody comprehends in full do the work for you”.
Finally he thinks on what could be the possible ways to deal with it (beside some vector instructions for easier operations in the codecs none of it looks feasible to him—or me). And he ends with an uplifting possible outcome of codecs becoming the next COBOL—not going anywhere but remaining largely irrelevant. It might go this way indeed.
As a bonus I’d like to present my own thoughts on this topic. Back in the day I ranted about H.264 reinventing a wheel so the things were going downhill back then already.
I guess there are two processes at work here: codecs by themselves do not bring that much money as before (partly because people expect stuff for free, partly because you can’t vendor lock them so you have to interoperate and compete with other parties). And when codec licensing started to bring serious money (probably starting with MP3 and H.264) there were not only research institutions willing to fund R&D but a lot of patent trolls got active who’d readily exploit a patent from an old bankrupt company to get their share of the royalties. That’s how we got several H.265 licensing pools and Opus patent pool (from entities not related to the codec development in any way). This stifles new codec adoption with patent concerns and their development by (equally reasonable) concerns of not getting enough money for that research. I’d refer you to Leonardo Chiariglione’s blog for a more detailed picture but those who are aware of it have read it already and others have no desire to do that anyway.
As for the education part, I guess there’s not that much interest to fund its development either. I remember reading some texts from Dmitry Vatolin, the head of MSU Graphics and Media Lab (known for their quality measuring tools and regular codec comparisons). There he points out that they have the problem with preparing students: there’s a lack of specialists so students get offers before they have a chance of graduating, which leads to the lack of interest in training such students, so the lack of specialists increases… Probably it’s not as drastic in the West and companies there understand the need for independent research entities and sponsored university labs (and not only as the source of people; and not see it as a constant competition threat) but I don’t know much. He also points out at “digital dementia” as one of the problems with the modern students—and this problem is more universal, so you probably should not expect any improvement with codec development coming from the academia.
And there’s probably a ceiling on the media format specifications which is not worth exceeding. For audio it’s about 56 kHz at 20 bits (it’s ironic that Meridian Audio responsible for that research also helped to create the failed Pono Player). For video we should be near the limit as well, so the video bandwidth should stop increasing while communication capabilities keep on improving—thus making the need for better compression less important.
So we got no men, we got no money and we got no reason too to keep developing the new formats (beside niche ones and ego projects like the very fresh Quite Okay Audio). As usual, I’ll be glad to be proven wrong.
I started to fall away from active multimedia hacking circa 2011 for a variety of reasons, one of which was that the whole world of multimedia was becoming quite standardized and therefore boring. I was very interested in codec technology — same as you — but all the codec action (at least for video) was heading into hardware. Nowadays, even the cheapest little ARM SoC performs 4K H.265 dec/enc in real time straight in the silicon without any software needing to break a sweat. When I started in 2000, things were still very much the “Wild West” in codec-ville, with no real idea how things were going to end up.
It’s interesting how your observation mirrors other treatises I’ve read regarding how no one really knows how an entire computer works anymore (as was possible in the early days of personal computing)– no one knows how the compiler works or how the video codec works. Pretty soon, it’s going to be “no one knows how the code-generating AI works” (actually, we might already be there!).
It just dawned on me that maybe one of us should start making YouTube videos about how codecs work, and also create TikTok shorts that integrate dances somehow. That’ll get the kids interested! 🙂
I do not look like a bee enough to transmit complex information via dance.
As for AI-generated stuff, it’s distinct because for the other epochs you can say that there was time when people understood how it’s done. Not in this case.
P.S. I’m also bored by the current multimedia landscape (just look at the blog title!) but at least Cambrian explosion of codecs left enough interesting fossils to study (the problem is usually locating them or even learning that they exist).
I’m optimistic. Netflix and the other folks are spending a lot of money on transmission, and would prefer not to. Some useful techniques will trickle out of that. And it’s possible all this ML hoo-ha will figure out some interesting heuristics for less expensive encoding.
current codecs are still missing features like multispectral or floating point videos for productions and they still use OpenEXR image sequences.
FFV1 spec repository has those pull requests but they have not progressing.
https://github.com/FFmpeg/FFV1/pulls
recently hot “satellite communications by smartphone” would need extremely low bitrate codecs.
however AI video codecs would enlarge battery problems so I bet they won’t use video codecs like NVIDIA Maxine’s AI Video Compression early.
3D video codecs for AR/VR are glorious future but MPEG V-PCC reuse the exisiting 2D video codecs so it wont stop the 2D codecs becaming the cobol.
The question is: do mainstream codecs need those features? I can understand intermediate codecs (you can say that FFV1 somewhat belongs to that group) may require it but it’s a rather small niche.
AI-based and 3D codecs give me MPEG-4 flashbacks—which is not a good thing. Hopefully I’ll have nothing to do with them.
Invent new lossless coding method for audio and become rock star. QOA is path to follow.
I know the names of APE, FLAC, Monkey’s Audio, MPEG-4 ALS, TAK and WavPack creators. None of them became a star. So you offer me two activities I don’t care about.
But to become real star, the lossless method needs to be both fast and compress much better than any current solutions available. I’m sure its doable.
Can’t beat RealMedia, their
rm
utility is the fastest and has the best compression ratio. There’s just one small flaw in it…So now you guys finally won’t be busy with the cool stuff and have time to fix boring things like playing video on Linux with hwdec without tearing!
If only. Making sure that video is decoded is easy (even if it takes time), tearing during playback is caused by the mix of GPU driver and display system (e.g. X11 or Wayland). And while codec specifications are more or less stable, drivers keep changing outside your control. No wonder people are not willing to go into it (I remember there was a lengthy rant on
mpv
wiki somewhere about that).Over the last several years I have indeed started acculating a few textbooks and otherwise non-digitally-available and/or out of print books (weird paper things, they kinda smell funny) and they have, as you say, actually contained useful and informative info. Who knew. I even have a few shelves for them.
Of course, I have also ended up with a few better left to the recycling pile in the process, as it’s not always clear which are even worth hunting out, or which even contain the info in the first place (e.g. I hunted for ages for the origin of some specific number in the analogue TV world and all the textbooks either stated it as a given, or referenced other books which stated it as a given… eventually I figured out by dumb luck that it originated in some FCC proposal by RCA in the 40s, in which seems pulles out of some engineer’s butt).
That’s all to say I can see why nobody can be bothered to dig. (One could argue that the foundations of what’s a large chunk of the digital age should be available digitally… but that seems like a lot of effort.)
I understand and respect your efforts. What saddens me though is that most likely printed paper books are more likely to outlive the digital ones (thanks to the copyrightmongers and their DRM schemes), so the information is likely to live in an obscure place called library (I think I’ve visited those back in my student years).
This is something smiler I wrote for a long time. If we had infinite bandwidth, would we require a new video codec? If bandwidth cheaper, what gain would we have from serving higher bitrate 1080P H.264 files rather than AV1. And why pay the extra storage and compute cost for adding another Video Codec? And I was proposing at the time it make much more sense to have a higher bitrate H.264 files than another codec. And as we now discovered that is exactly what Youtube Premium is doing. ( Although at some point a new codec would still cross the compute and bandwidth threshold )
On another note, H.267, the next version after VVC, has the same encoding and decoding complexity increase. Where all previous generation were 10:2 or 10:1.
At this point in time I would much rather we have an improved H.264 codec that is patent free.
I’d say the conditions are a bit unrealistic – for streaming services bandwidth limits the number of users (and laws of physics prevent it from growing fast, the same applies to storage). For video hosting considering the sheer amount of videos uploaded every second storage is also a big problem.
Maybe we’ll hit a plateau eventually with the amount of connected users being more or less stable, no increases in resolution and framerate (the human visual system is limited, same as with audio where the limit has been reached earlier) and bandwidth enough to serve them all. Then we can argue there’s no need for a better codec and keep using patent-free H.264.