While there’s nothing much to write about the encoder itself (it should be released and forgotten soon), it’s worth recording down how some magic numbers in the code (those not coming from the specification) were obtained. Spoiler: it’s mostly statistics and gut feeling.
(more…)
Archive for the ‘RealVideo’ Category
rv4enc: magic numbers
Tuesday, May 16th, 2023rv4enc: probably done
Saturday, May 13th, 2023In one of the previous posts I said that this encoder will likely keep me occupied for a long time. Considering how bad was that estimation I must be a programmer.
Anyway, there were four main issues to be resolved: compatibility with the reference player, B-frame selection and performing motion estimation for interpolated macroblocks in them, and rate control.
I gave up on the compatibility. The reference player is unwieldy and I’d rather not run it at all let alone debug it. Nowadays the majority of players use my decoder anyway and the produced videos seem to play fine with it.
The question of motion vector search for interpolated macroblocks was discusses in the previous post. The solution is there but it slows down encoding by several times. As a side note, by omitting intra 4×4 mode in B-frames I’ve got a significant speed-up (ten to thirty percent depending on quantiser) so I decided to keep it this way by default.
The last two issues were resolved with the same trick: estimating frame complexity. This is done in a relatively simple way: calculate SATD (sum of absolute values of Hadamard-transformed block) of the differences between current and some previous frame with motion compensation applied. For speed reasons you can downsample those frames and use a simpler motion search (like with pixel-precision only). And then you can use calculated value to estimate some frame properties.
For example, if the difference between frames 0 and 1 is about the same as the difference between frames 1 and 2 then frame 1 should probably be coded as B-frame. I’ve implemented it as a simple dynamic frame selector that allows one B-frame between reference frames (it can be extended to allow several B-frames but I didn’t bother) and it improved coding compared to the fixed frame order.
Additionally there seems to be a correlation between frame complexity and output frame size (also depending on the quantiser of course). So I reworked rate control system to rely on those factors to select the quantiser for I- and P-frames (adjusting them if the predicted and the actual sizes differ too much). B-frames simply use P-frame quantiser plus constant offset. The system seems to work rather well except that it tends to assign too high quantisers for some frames, resulting in rather crisp I-frame followed by more and more blurry frames.
I suppose I’ll play with it for a week or two, hopefully improving it a bit, and then I shall commit it and move to something else.
P.S. the main goal of NihAV
is to provide me with a playground for learning and testing new ideas. If it becomes useful beside that, that’s a bonus (for example, I’m mostly using nihav-sndplay
to play audio nowadays). So RealVideo 4 encoder has served its purpose by allowing me to play more with various concepts related to B-frames and rate control (plus there were some other tricks). Even if its output makes RealPlayer hang, even if it’s slow—that does not matter much as I’m not going to use it myself and nobody else is going to use it either (VP6 encoder had some initial burst of interest from some people but none afterwards, and nobody cares about RV4 from the start).
Now the challenge is to find myself an interesting task, because most of the tasks I can think about involve improving some encoder or decoder or—shudder—writing a MOV/MP4 muxer. Oh well, I hope I’ll come with something regardless.
rv4enc: B-frame experiments
Saturday, May 6th, 2023As I mentioned in the previous post, one of the problems is to find a good motion vector for B-frame interpolated macroblock. Since I had nothing better to do I’ve decided to try motion vector search in the same style as the usual motion estimation: start from the candidate motion vector pair and try adjusting both vectors using diamond pattern (since it’s the simplest one).
The results are not exciting: while it slightly improves PSNR and reduces file size (on lower quantisers), encoding time explodes. My 17-second 320×240 test clip encoded with quant=16
and two B-frames between I/P-frames takes 40 seconds without that option and 136 seconds with it. And while average PSNR improves from 38.0446 to 38.0521, the size decreases from 1511843 bytes to 1507224.
That’s the law of diminishing returns in action. Of course it can be made significantly faster by e.g. using pre-interpolated set of reference frames but why bother in this case? I’ve put this under an option (i.e. be satisfied with the initial guess or try to search for a better pair of motion vectors) but I doubt anybody will ever use it (the same applies to the whole encoder as well).
rv4enc: somewhat working
Wednesday, May 3rd, 2023I’ve finally managed to implement more or less working RealVideo 4 encoder with all the main features (yeah, I’m also surprised that I’ve got to this stage this fast). As usual, it’s small details that will take a lot of time to make them decent let alone good.
So, what can my encoder actually do now? It can encode video with I/P/B-frames using provided order, it can encode all possible macroblock types and has some kind of rate control.
What it does not have then? First of all, I don’t know yet how it would fare with the original RealPlayer (I also need to modify RMVB muxer to output improper B-frame timestamps and maybe write the additional streaming-related information). Then there’s a question of having a proper rate control. And finally there are a lot of questions related to B-frames.
Currently my rate control is implemented as a system that keeps statistics on how large is on average an encoded frame for a given frame type and quantiser and tries to find the best fitting quantiser. If there’s still no statistics (or not enough of it) I resort to a simpler quantiser guessing, adjusting quantiser depending on how different are the projected and actual frame sizes. Of course it can be tuned to behave better (the question is how though). And I’m not going to touch the two-pass encoding (theoretically it’s rather simple—you log various encoder information in the first pass and use it to select quantisers better in the second part; in practice it means messing with text data and doing additional guesstimates, so pass).
With B-frames there are two main issues to deal with: which frames to select and how to perform motion estimation. I read the first can be achieved by performing motion compensation against neighbouring frames and calculating SATD (often done on scaled-down frames to be faster). The second question is how to search for a bidirectional block vectors. Currently I have a very simple approach: I search for a forward and backward motion vectors independently and check which combination of them works the best. I suspect there may be an approach specifically for weighted bi-directional search but I could not find anything (and I’m not desperate enough to dive into the codebase of MPEG-4 ASP/AVC encoders).
And finally there’s the whole question of quality. I suspect that my encoder is far from being good because it should not merely transform-quantise-code blocks but also perform some masking (i.e. set some higher-frequency coefficients to zero instead of hoping that they’ll be quantised to zero).
So this will be long and boring work…
rv4enc: limping forward
Tuesday, April 18th, 2023So far there’s not much to write about: I’ve dealt with the second most annoying thing (writing coefficients) and now my encoder can produce a valid stream of I- and P-frames. Probably it’s a good place to define a road map though.
First of all, there’s still the most annoying thing pending—in-loop filtering. And I can’t reuse that much code from the decoder since it was written back in the day when my knowledge of Rust was even less than now, so I have to partly rewrite some of the parts to make them fit my current approaches to the interfaces (that means a lot of DSP functions among other things). At least it can be disabled for now but I’ll have to return to it sooner or later.
Then there’s 4×4 intra prediction mode still waiting to be implemented. Again, I know how to pick the good enough prediction modes, it’s context-dependent coding of those modes that is going to be annoying.
Another thing missing is estimating the number of bits required to encode a single block. There are about five codebooks involved and those are selected depending on macroblock type, quantiser, coefficient block type (luma, chroma or luma DCs) and the global set index. I guess I’ll resort to gathering statistics for all possible coding modes and seeing if I can make some heuristic estimation out of it. And there’s still a task of selecting the best coding set for a slice…
After all that it will be mostly B-frame related things left to implement. I also need to make sure the muxer will write them out properly (for historical reasons in that case it mangles the timestamp to be last frame timestamp plus one). That’s not counting all possible enhancements like deciding the frame type for coding. Most likely I’ll be annoyed by it and keep the fixed coding order instead.
There are too many things to do and considering my pace it may keep me busy for a good part of the year—I mean a large part of a calender year, the current date is still February 24.
Starting yet another failure of an encoder
Thursday, April 6th, 2023As anybody could’ve guessed from Cook encoder, I’d not want to stop on that and do some video encoder to accompany it. So here I declare that I’m starting working on RealVideo 4 encoder (again, making it public should prevent me from chickening out).
I can salvage up some parts from my VP7 encoder but there are several things that make it different enough from it (beside the bitstream coding): slices and B-frames. Historically RealMedia packets are limited to 64kB and they should not contain partial slices (grouping several slices or even frames in the packet is fine though), so the frame should be split during coding. And while Duck codecs re-invent B-frames to make them still be coded in sequence, RealVideo 4 has honest B-frames that should be reordered before encoding.
So while the core is pretty straightforward (try different coding modes for each macroblock, pick the best one, write bitstream), it gives me enough opportunity to try different aspects of H.264 encoding that I had no reason to care about previously. Maybe I’ll try to see if automatic frame type selection makes sense, maybe I’ll experiment with more advanced motion search algorithms, maybe I’ll try better heuristics for e.g. quantiser selection.
There should be a lot to keep me occupied (but I expect to spend even more time on evading that task for the lack of inspiration or a sheer amount of work to do demotivating me).
NihAV: towards RealMedia encoding support
Friday, March 10th, 2023Since I have nothing better to do with my life, I’ve decided to add a rudimentary support for encoding into RealMedia. I’ve written a somewhat working RMVB muxer (it lacks logical stream support that is used to describe the streaming capabilities, it also has some quirks in audio and video support but it seems to produce something that other players understand) and now I’ve made a Cook audio encoder as well.
Somebody who knows me knows that I fail spectacularly at writing non-trivial lossy audio encoders but luckily here we have a format which even I can’t botch much. It is based on parametric bit-allocation derived from band energies. So all it takes is to perform slightly modified MDCT, calculate band energies, convert them to scale factors, perform bit allocation based on them, pack band contents depending on band categories and adjust the categories until all bands fit the frame. All of these steps are well-defined (including the order in which bands should be adjusted) so making it all work is rather trivial. But what about determining the frame size and coupling mode?
As it turns out, RealMedia supports only certain codec profiles (or “flavors” in its terminology), so Cook has about 32 different flavours defined for different combinations of sample rates, number of channels and bitrates. And each flavour has an internally bitrate parameters (frame size and which coupling parameters to use) for each channel pair so you just pick a fitting profile and go on with it. In theory it’s possible to add a custom profile but it’s not worth the hassle IMO.
And now here are some fun facts about it. Apparently there are three internal revisions of the codec: the original version for RealMedia G2, version 2 for RealMedia 8 and multichannel encoder (introduced in RealMedia 10, when they’ve switched to AAC already). Version 2 is the one supporting joint-stereo coding. The codec is based on G.722.1 (the predecessor of CELT even if Xiph folks keep denying that) but, because Cook frames are 256-1024 samples instead of fixed 320-sample frames, they’ve introduced gains to adjust better for volume changes inside the frame (but I haven’t bothered implementing that part). That is probably the biggest principal change in the format (not counting the different frame sizes and joint stereo support in the version 2). And finally I should probably mention that there are some flavours with the same bitrate that differ by the frequency range and where the joint stereo part starts (some of those are called “high response music” whatever that means).
Time to move to the video encoder…
RV6: a blast from the past
Tuesday, November 8th, 2022So while various terrorist states try to prove they have better industry than russia, here’s yet another fun distraction.
It turned out that Peter Ross for some unfathomable reason decided to take a look at RealVideo 6. RealPlayer™ has RV11 support (which is FFmpeg with wrappers for calling their proprietary encoder and decoder for RV11), he played a bit with it. And since there’s one opensource decoder for RV6 (should be obvious which one), he contacted me with a list of changes to make it work fine not just on the only sample I had. Many thanks for that!
Beside various small mistakes there are several things that are rather curious:
- RMVB container version 2 supports 64-bit offsets. Now all is left is to find a person willing to create RMVB files over 4GB in size;
- chroma motion compensation case (¾,¾) is the same as (¾,½);
- loop filtering formula was changed from
|p0-q0|*lim1 < 512
to((|p0-q0|*lim1) & ~0x7F) <= 384
. They should be equivalent (even if I can't bother to prove it) but the change is still baffling; - and it turns out that delta quantisers are once per CU instead once per one row. More about it below.
So what's the deal with delta quantisers? When I REd the code, the binary specification had all debug information left inside. So when you see a function named Decoder::parseBitStream_CUOneCULine()
that calls decodeCUQPOffset()
once before calling decodeAndReconstructCBTree()
in a loop you'd presume it reads DQP once per one row of CUs yet it turned out to be one DQP per CU (also reasonable but not what you expect from the function name). The thread management code that sets various parameters for slice decoding and calls the function is too messy to figure out parameters from it (I tried and got a number of 4 CUs per call unless it's at the end of row—which is obviously wrong).
It's nice to see that RealVideo 6 keeps up the traditions of previous RealVideo versions.
NihAV: now with full RealMedia support!
Saturday, December 15th, 2018In late September 2017 I’ve started to work on RealMedia support in NihAV
with an intent to have full support for RealMedia. So more than a year later I’ve reached that goal.
In order to do that I had to reverse engineer one and a half codecs and one format. Here’s the full list.
Supported formats:
- RealAudio (i.e. just single audio stream);
- plain RealMedia (i.e. just a bunch of audio and video streams);
- RealMedia with multiple data chunks (i.e. one or several streams are stored in separate chunk, it’s nothing extraordinary but still needs to be accounted for);
- RealMedia multiple stream variants (i.e. single logical stream is represented by several substreams and you have to select one based on quality);
- IVR, their own recording format (I had to RE this one).
Supported audio codecs:
- RealAudio 1 aka 14.4;
- RealAudio 2 aka 28.8;
- RealAudio (AC)3 aka DNET;
- RealAudio 4/5 (licensed from Sipro);
- RealAudio G2 (cook);
- RealAudio ATRAC3;
- RealAudio AAC-LC (no SBR);
- RealAudio Lossless.
And video codecs:
- RealVideo 1;
- RealVideo Fractal aka ClearVideo (I had to finish REing P-frame format for that one);
- RealVideo 2;
- RealVideo 3;
- RealVideo 4;
- RealVideo 6 or HD (I had to RE this one and now it decodes the sample I have with only minor glitches).
And here are some words about IVR that I had to RE this week.
Update: it turns out Paul had reverse engineered the format before NihAV
came to existence but his implementation is even sketchier than mine unfortunately.
There are actually two formats there, one with magic .REC
that contains actual recording and another one with magic .R1M
that may contain several of those .rec embedded. Both formats internally reminded me more of Flash than RealMedia because both files are composed of records that can be distinguished by the first byte (yes, I still remember RTMP and how I had to parse it). R1M has two kinds of records: 0x01
—recording metadata it seems, 0x02
contains actual REC.
REC files (or sub-entries in R1M) have defined amount of global properties followed by stream specific properties followed by (optional) stream seek tables followed by actual packets. All numbers are big-endian and 32-bit (seek table offsets seem to be 64-bit). Strings are coded as string length (including terminating zero) followed by string data and zero terminator.
REC record types:
- stream properties start, has a number of properties coded after it;
- packet header, more about it below;
- key-number pair, has key value (string), a number property length (always 4) and actual number value;
- binary data, has key value (string), binary data length and actual data;
- key-value pair with both key and value being strings;
- end of header record with three numbers, first of which gives an absolute (from the beginning of REC data if embedded) offset for the seek tables or packets;
- packet data end, always followed by eight zeroes;
- packet data start, always followed by eight zeroes. This record seems to be present only when seek tables are present (to detect the end of those?), otherwise packets follow end-of-header record immediately.
There may be several RJMx chunks at the end of IVR with additional metadata but they posed no interest to me.
I had some trouble with IVR packets since I expected them to be exactly the same as in RM but it turned out to be the same payload format but with different header:
- 32-bit timestamp;
- 16-bit stream number;
- 32 bits of flags. I suspect this might code packet group for
MLTI
substreams, keyframe information and such but I could not find a proper pattern valid for all three samples (and demuxer works fine without it too); - 32-bit payload length;
- 32-bit header checksum (most likely). I was not able to understand how it works but header checksum seems to be the most plausible explanation.
I am fully aware that my current implementation has bugs and flaws and might not decode all files perfectly but it decodes all kinds of files and that’s good for me. Also what to expect from software written by one lazy guy in his free time for himself?
Next is probably Duck type of codecs or totally RAD ones. Or maybe I’ll waste time on making NihAV
conform to Rust 2018 edition. This seems to be a task about half as hard as porting code from K&R C to ANSI C (from a quick glance you have to change at least imports from inside the crate, traits now require word dyn
and there may be more). Or it may be NAScale for all I care (and I don’t care at all). The time will show.
NihAV: The Fastest RealVideo 6 Decoder in Rust!
Thursday, November 29th, 2018I guess the title shows how stupid marketing something can be if there’s just one contestant in the category—so in order to win you merely need to exist. Like in this case: NihAV
can barely decode data and it’s not correct but the images are recognizable (some examples below) and that’s enough to justify this post title. Also I have just one sample clip to test my decoder on but at least RV6 is not a format with many features so only one feature of it is not tested (type 3 frames).
Anyway, here’s the first frame—and it is reconstructed perfectly:
The rest of frames is of significantly worse quality but with more details.
(more…)