I’ve been wanting to write this post for a long time, with a focus on the difference between hobby project and product and about NihAV
only. But a recent FFdrama made me re-think both the structure and the conclusions.
Apparently there’s another surge of developer’s discontent in jbmpeg
for having mushroom treatment (not for the first time and probably not for the last one). IMO they need to realize the project is as free and democratic as Soviet Union and you need simply to agree to the things proposed by the General Secretary (definitely not the leader)—that would save time and nerves of everybody involved. As I wrote countless times before, I do not fear for the future of that project as it can keep such existence indefinitely long and here I’ll try to present my reasons why.
First of all, revolution a la libav
won’t work—Michael has learned the lesson and he won’t be kicked out again (not that it really worked in 2011 but now there are no chances for that).
Second, if you split out and form an alternative, it has not so many chances to replace it. And if you decide to write anything from scratch your chances are next to zero. The rest of this post is dedicated to answering why.
Recently I re-read The Mythical Man-Month which tells not only about the author’s experience designing RedHat OS/360 but also presents more general observations and ideas. And right in the beginning he talks about the difference between a program, a programming product, and a programming systems product. Essentially, a program something a programmer writes and that it works for him on his system; a programming product is a program with documentation and support; and a programming system product is that works as a component in a larger system. And moving from one stage to another requires an effort several times larger than the previous one (I’m simplifying a lot and probably misremember something—so you’d better read the original book, it’s a worthy reading anyway).
Here we have a similar situation: writing a tool just to do things for you is straightforward, even I have managed to do it with NihAV
; making it into a product requires offering much wider support for different platform configurations (for example, my videoplayer has VA-API hardware decoding enabled by default while it’s not available, say, on Windows and you need to switch that feature off there in order to build it) and different features (e.g. nihav-encoder
works for testing encoding per se, but lacks ability to encode input into a good intermediate format supported by other players and encoders). And it gets even worse if you try to make it into a library ready to be used by others—beside the usual things like documentation you’re expected to guarantee some API stability and a certain level of quality. So while I may not care that my app panics/crashes in certain circumstances, it’s significantly less forgivable for a library. And of course achieving such quality level requires a lot of unexciting work on small details. Debugging is even worse.
Suppose you decided to create a fork and work from that. Then you still have a much worse position—you may have the same codebase but there are no killer features you can offer and you don’t have recognition. libav
managed to succeed for a while since it was supported by some distribution maintainers—and even then users complained because the de facto brand name was replaced with some unknown thing. And I guesstimate 40% of current jbmpeg
developers contribute to it in order to upstream the changes they make while using it in their employer’s product or pipeline. So how can you convince those companies to use your fork instead? And that’s not taking patent situation into account which makes a substantial support from any large company for your project rather improbable.
Good thing I’ve never intended NihAV
to be a competition, but what about other projects? rust-av
died because of lack of interest (Luca claims that he started it mostly to learn Rust and see how performant it can get—mission accomplished, no further development required). librempeg fares better but I doubt that Paul wants to deal with all the demands that other parties make for the honour of your stuff being included into their distribution (or being used without even a credit).
Another thing that needs to mentioned is that multimedia is no longer an attractive field. Back when I started to dabble in the field, it was rather exciting: there were many different formats around—in active use as well, and people wanted to play them not only with the proprietary players. There were libraries and players supporting only a specific subset of different formats, like avifile
or libquicktime
or DVD-only player. Nowadays it’s usually a combination of H.26x+AAC in MP4 or VP9/AV1+Opus in WebMKV, all formats having specifications (unless you lack Swiss Francs to pay for the ones from ISO) and new formats are not introduced that often either. Of course, we might have H.267 standardised soon but who uses even H.266? When was the last time you heard AV2 development news? The codec was supposed to be released a couple of years ago, did I miss it along with AV3? Do you remember Ghost audio codec from Xiph? Of course Fraunhofer will keep extending AAC patent lifetime by inventing new formats and naming them like OMGWTFBBQ-AAC but who really cares?
That is why I believe that no matter how dysfunctional jbmpeg
is, it will keep existing in this undead state indefinitely long as it’s good enough for the most users and there’s no compelling reason (e.g. new popular formats or radically different ways to process data) to switch to anything else. The only winning move is not to play.
The too much forking is not good if it causes excessive fragmentation because then there is bigger chance projects will stop its development cycle.
PS. Now for something more useful I hope, working on zlib encoder/(deflate compression, have working uncompressed mode currently). want to make it both fast and compression ratio good (even at fast speeds) so wondering what fast and reliable algorithm can be used for this. (I have general idea how to count symbol and run-length frequencies but issue are “disconnected” repeated patterns of variable length – wondering how to detect them all and at the same time really fast. Using dictionary like handling for all supported pattern lengths seems excessive and slow, like O(N*(1+3+..+258)))
I suspect excessive fragmentation happens either when you too little interest in a project or too much, so it’s easier to fork than to contribute.
As for your deflate-related things, you can look at e.g. https://glinscott.github.io/lz/index.html#toc4.2.2
I think modern implementations can use something like suffix array instead but the conventional encoders use hash chains just fine. And the speed variation comes mainly from the number of matches you check.
That is exactly why I seek for algorithm that will not be slower if I check for all matches of some max limit length within 65535 chunk size.
So something like suffix automata could be useful? And better and faster than hash approach?
You find all substrings (up to max allowed length specified by deflate) within big string and at same time you know their frequencies/occurrences and their initial first occurrence all this in near linear time.
Since you’ve managed to implement one of the fastest FFTs out there after reading the papers, you can probably find something for this case too. This topic is also rather exciting and thus a lot papers are still produced.
I did not bother with it because I prefer simplicity over speed. Hash chain is trivial to construct and to update, what’s better you do it in linear time (and lose on search time probably, but that depends on input data). And maybe you can find more inspiration in code here: https://github.com/powturbo/TurboBench
Even there it is slow compression by big margin.
Even building suffix automaton is drastically reducing compression speed.
Looking at more modern papers there is not much except AI histeria.
I’m pretty sure many of the proposed algorithms work extremely well on asymptotically large dictionaries (and with appropriate amount of RAM).
Still, why not start with hash chain while keep parts more or less abstracted? I made match finder (i.e. the code that actually performs searches for the longest occurrence of string), tokeniser (the code that decide how exactly to split input data into a combination of literals and string repeats) and bitstream writer all independent from each other. So in theory I can improve one part without touching any other. Who knows, maybe studying the code of reportedly fast LZ77-based compressors will give you better ideas.
There is not a single incorrect word written here.
P.S. What the world needs is a new surge Japanese lossless codecs.
I’m pretty sure those are still being developed, it’s just hardly anybody outside the country cares. Have you heard of, say, SRLA?
No support for AC-4, Prores RAW, Bink 2, BRAW, and more and more.
Just jbmpeg lost its way and now is fighting within for remaining relevance and power.
Gonna open betting opportunities so one can bet on which side will win.
You know, russian führer said “we’ll let others develop the technologies and then *snap*”. What would prevent them from merging e.g. your work?
And on the other hand, they may miss a bunch of formats in some demand, but nobody at large cares. IMO as long as they’re fixing potential security vulnerabilities there’s no way to get rid of them.
Thanks I got now raw, static & dynamic block coding working with only literal compression supported.
Wonder how to add additional codes up to 30 that will save longer patterns and distances but not hurting performance more than 5-10% of current encoding speed that is > 11x (280 fps) real-time single threaded for SVGA (800×600) resolution (bgr24 format with zlib codec encoder).
SLRA seems to be audio, but we already peaked with OptimFROG anyway.
@Derek well, it’s still an example of Japanese lossless codec you haven’t heard of, so there’s a chance for video codecs as well.
@Paul RLE is trivial to implement and it’s coded as distance=-1, just play with it in fixed-codes first; and for raw video data you can still check if the data for the line above matches. In theory those additional checks may improve performance since you’ll have less of output data to deal with.
Oh I managed to write zlib deflate compressor that is faster always and more efficient (sometimes) then reference code. And it only use subset of distance coding and only does RLE kind of thing backwards. Still I think suffix automaton/tree is more complete solution but i think its implementation is slow because it misses CPU caches a lot.
Nice! It’s a pity you don’t have a blog where you describe how you made various things faster.
As for performance in general, I suspect the reason is mostly because overhead of constructing and updating data structure in this case is too much, not the cache pressure. But you can always resort to a profiler 😉
And remember, here it depends a lot on data: compressing pure image data and already semi-compressed mix of MVs and image blocks may have different performance against the reference
zlib
. In either case I’m looking forward to find out how you manage to improve your implementation even further.Usually the lazy stuff to make faster code for free is to closely inspect loops that do all the work and then remove excessive pointer referencing for C code. This usually is enough to signal compiler where to do optimizations, and sometimes compilers are smart enough to do it. Writing SIMD for every single busy loop is extremely time consuming and not future proof (the set of instructions comes and goes) so I want to avoid it if I can.
More advanced stuff is refactoring code to reduce cache misses and increase speed processing while avoid pointless instructions and operations.
I’m not fan of micro-optimizations at expense of code readability and I tend to look at problem at multiple perspectives, thing I learned while REing codecs.
A good sensible approach indeed. And opposite of what
zlib
did, at least back in the day (because C compilers back then were not very good at optimising code). Or your favourite project in certain parts.