In my efforts to have an independent player (that relies on third-party libraries merely for doing input and output while the demuxing and decoding is done purely by NihAV
) I had to explore the way of writing a multi-threaded H.264 decoder. And while it’s not working perfectly it’s a good proof of a concept. Here I’ll describe how I hacked my existing decoder to support multi-threading.
The idea behind it is rather simple: have a central dispatcher that watches the status of worker threads and reports the progress to any involved party. Individual frame decoders report their progress on the frame by updating the currently decoded macroblock number (and/or error status if an error has happened) and query if the specific other decoder has finished decoding up to certain macroblock so they can perform motion compensation from it. The high-level decoder sends new frames to be processed and queries if there’s a decoded frame already (or a decoding error if you’re less lucky).
The only tricky thing is frame management as you should keep track of frames that should be dropped after being no longer referenced and H.264 frame management is pretty complicated. Oh well, I have something that conceptually works and I’ll fix it when the real need arises.
From the technical level there’s nothing complicated either—Rust provides enough building blocks in its standard library. So the dispatcher that watches after threads is put inside Arc<RwLock<>>
to allow shared access: high-level decoder may occasionally want to have an exclusive write access to it but frame decoding threads are fine just peeking/updating atomic variables for the particular frame state. Accessing other frame information in this way would be too messy though so I use my own NABufferRef<>
that allows shared access (I know that it’s hacky and unsafe but I do not complain).
Overall, there’s nothing particularly complex about it. Though before integrating it all I probably need new API for multi-threaded decoding similar to my current encoding API (i.e. the one that first takes as many as possible input frames and gives some output frames) plus some auxiliary code to make writing such decoders easier. And then adapt the rest of the code to work with those decoders. In other words, a lot of tedious work—for another time. For now I’ll try to find something more interesting to work on.