While there’s nothing much to write about the encoder itself (it should be released and forgotten soon), it’s worth recording down how some magic numbers in the code (those not coming from the specification) were obtained. Spoiler: it’s mostly statistics and gut feeling.
(more…)
Archive for the ‘rv4encfail’ Category
rv4enc: magic numbers
Tuesday, May 16th, 2023rv4enc: probably done
Saturday, May 13th, 2023In one of the previous posts I said that this encoder will likely keep me occupied for a long time. Considering how bad was that estimation I must be a programmer.
Anyway, there were four main issues to be resolved: compatibility with the reference player, B-frame selection and performing motion estimation for interpolated macroblocks in them, and rate control.
I gave up on the compatibility. The reference player is unwieldy and I’d rather not run it at all let alone debug it. Nowadays the majority of players use my decoder anyway and the produced videos seem to play fine with it.
The question of motion vector search for interpolated macroblocks was discusses in the previous post. The solution is there but it slows down encoding by several times. As a side note, by omitting intra 4×4 mode in B-frames I’ve got a significant speed-up (ten to thirty percent depending on quantiser) so I decided to keep it this way by default.
The last two issues were resolved with the same trick: estimating frame complexity. This is done in a relatively simple way: calculate SATD (sum of absolute values of Hadamard-transformed block) of the differences between current and some previous frame with motion compensation applied. For speed reasons you can downsample those frames and use a simpler motion search (like with pixel-precision only). And then you can use calculated value to estimate some frame properties.
For example, if the difference between frames 0 and 1 is about the same as the difference between frames 1 and 2 then frame 1 should probably be coded as B-frame. I’ve implemented it as a simple dynamic frame selector that allows one B-frame between reference frames (it can be extended to allow several B-frames but I didn’t bother) and it improved coding compared to the fixed frame order.
Additionally there seems to be a correlation between frame complexity and output frame size (also depending on the quantiser of course). So I reworked rate control system to rely on those factors to select the quantiser for I- and P-frames (adjusting them if the predicted and the actual sizes differ too much). B-frames simply use P-frame quantiser plus constant offset. The system seems to work rather well except that it tends to assign too high quantisers for some frames, resulting in rather crisp I-frame followed by more and more blurry frames.
I suppose I’ll play with it for a week or two, hopefully improving it a bit, and then I shall commit it and move to something else.
P.S. the main goal of NihAV
is to provide me with a playground for learning and testing new ideas. If it becomes useful beside that, that’s a bonus (for example, I’m mostly using nihav-sndplay
to play audio nowadays). So RealVideo 4 encoder has served its purpose by allowing me to play more with various concepts related to B-frames and rate control (plus there were some other tricks). Even if its output makes RealPlayer hang, even if it’s slow—that does not matter much as I’m not going to use it myself and nobody else is going to use it either (VP6 encoder had some initial burst of interest from some people but none afterwards, and nobody cares about RV4 from the start).
Now the challenge is to find myself an interesting task, because most of the tasks I can think about involve improving some encoder or decoder or—shudder—writing a MOV/MP4 muxer. Oh well, I hope I’ll come with something regardless.
rv4enc: B-frame experiments
Saturday, May 6th, 2023As I mentioned in the previous post, one of the problems is to find a good motion vector for B-frame interpolated macroblock. Since I had nothing better to do I’ve decided to try motion vector search in the same style as the usual motion estimation: start from the candidate motion vector pair and try adjusting both vectors using diamond pattern (since it’s the simplest one).
The results are not exciting: while it slightly improves PSNR and reduces file size (on lower quantisers), encoding time explodes. My 17-second 320×240 test clip encoded with quant=16
and two B-frames between I/P-frames takes 40 seconds without that option and 136 seconds with it. And while average PSNR improves from 38.0446 to 38.0521, the size decreases from 1511843 bytes to 1507224.
That’s the law of diminishing returns in action. Of course it can be made significantly faster by e.g. using pre-interpolated set of reference frames but why bother in this case? I’ve put this under an option (i.e. be satisfied with the initial guess or try to search for a better pair of motion vectors) but I doubt anybody will ever use it (the same applies to the whole encoder as well).
rv4enc: somewhat working
Wednesday, May 3rd, 2023I’ve finally managed to implement more or less working RealVideo 4 encoder with all the main features (yeah, I’m also surprised that I’ve got to this stage this fast). As usual, it’s small details that will take a lot of time to make them decent let alone good.
So, what can my encoder actually do now? It can encode video with I/P/B-frames using provided order, it can encode all possible macroblock types and has some kind of rate control.
What it does not have then? First of all, I don’t know yet how it would fare with the original RealPlayer (I also need to modify RMVB muxer to output improper B-frame timestamps and maybe write the additional streaming-related information). Then there’s a question of having a proper rate control. And finally there are a lot of questions related to B-frames.
Currently my rate control is implemented as a system that keeps statistics on how large is on average an encoded frame for a given frame type and quantiser and tries to find the best fitting quantiser. If there’s still no statistics (or not enough of it) I resort to a simpler quantiser guessing, adjusting quantiser depending on how different are the projected and actual frame sizes. Of course it can be tuned to behave better (the question is how though). And I’m not going to touch the two-pass encoding (theoretically it’s rather simple—you log various encoder information in the first pass and use it to select quantisers better in the second part; in practice it means messing with text data and doing additional guesstimates, so pass).
With B-frames there are two main issues to deal with: which frames to select and how to perform motion estimation. I read the first can be achieved by performing motion compensation against neighbouring frames and calculating SATD (often done on scaled-down frames to be faster). The second question is how to search for a bidirectional block vectors. Currently I have a very simple approach: I search for a forward and backward motion vectors independently and check which combination of them works the best. I suspect there may be an approach specifically for weighted bi-directional search but I could not find anything (and I’m not desperate enough to dive into the codebase of MPEG-4 ASP/AVC encoders).
And finally there’s the whole question of quality. I suspect that my encoder is far from being good because it should not merely transform-quantise-code blocks but also perform some masking (i.e. set some higher-frequency coefficients to zero instead of hoping that they’ll be quantised to zero).
So this will be long and boring work…
rv4enc: limping forward
Tuesday, April 18th, 2023So far there’s not much to write about: I’ve dealt with the second most annoying thing (writing coefficients) and now my encoder can produce a valid stream of I- and P-frames. Probably it’s a good place to define a road map though.
First of all, there’s still the most annoying thing pending—in-loop filtering. And I can’t reuse that much code from the decoder since it was written back in the day when my knowledge of Rust was even less than now, so I have to partly rewrite some of the parts to make them fit my current approaches to the interfaces (that means a lot of DSP functions among other things). At least it can be disabled for now but I’ll have to return to it sooner or later.
Then there’s 4×4 intra prediction mode still waiting to be implemented. Again, I know how to pick the good enough prediction modes, it’s context-dependent coding of those modes that is going to be annoying.
Another thing missing is estimating the number of bits required to encode a single block. There are about five codebooks involved and those are selected depending on macroblock type, quantiser, coefficient block type (luma, chroma or luma DCs) and the global set index. I guess I’ll resort to gathering statistics for all possible coding modes and seeing if I can make some heuristic estimation out of it. And there’s still a task of selecting the best coding set for a slice…
After all that it will be mostly B-frame related things left to implement. I also need to make sure the muxer will write them out properly (for historical reasons in that case it mangles the timestamp to be last frame timestamp plus one). That’s not counting all possible enhancements like deciding the frame type for coding. Most likely I’ll be annoyed by it and keep the fixed coding order instead.
There are too many things to do and considering my pace it may keep me busy for a good part of the year—I mean a large part of a calender year, the current date is still February 24.
Starting yet another failure of an encoder
Thursday, April 6th, 2023As anybody could’ve guessed from Cook encoder, I’d not want to stop on that and do some video encoder to accompany it. So here I declare that I’m starting working on RealVideo 4 encoder (again, making it public should prevent me from chickening out).
I can salvage up some parts from my VP7 encoder but there are several things that make it different enough from it (beside the bitstream coding): slices and B-frames. Historically RealMedia packets are limited to 64kB and they should not contain partial slices (grouping several slices or even frames in the packet is fine though), so the frame should be split during coding. And while Duck codecs re-invent B-frames to make them still be coded in sequence, RealVideo 4 has honest B-frames that should be reordered before encoding.
So while the core is pretty straightforward (try different coding modes for each macroblock, pick the best one, write bitstream), it gives me enough opportunity to try different aspects of H.264 encoding that I had no reason to care about previously. Maybe I’ll try to see if automatic frame type selection makes sense, maybe I’ll experiment with more advanced motion search algorithms, maybe I’ll try better heuristics for e.g. quantiser selection.
There should be a lot to keep me occupied (but I expect to spend even more time on evading that task for the lack of inspiration or a sheer amount of work to do demotivating me).