This week I’ve started reading the binary specification for RealVideo 6 again (yes, I had a post on RV6 technical description a year ago) since now I have a goal of implementing a decoder for it. So here I’ll try to document the format with as many details as possible.
Small update: obviously I’m working on a decoder so when it’s more or less ready I’ll document it on The Wiki.
Overall coding structure
RV6 seem to have four kinds of frames probably corresponding to the conventional I-/P-/B-frames. Every frame is split into horizontal stripes of 64×64 coding units (or CUs for short), each line of them being coded separately: there is a table with slice sizes right after general frame header. Probably you can decode them in the same way as Wavefront Parallel Processing in ITU H.EVC. The codec does not use arithmetic coding but rather fixed codebooks and Elias Gamma codes.
Each coding unit can be recursively split into smaller square coding blocks and then each block can be either spatially predicted from its neighbours or from one of the reference frames. And then also the difference between the prediction and actual data can be added too. It’s the same scheme employed in H.EVC, just simpler.
The codec has deblocking which can be performed after each CU is decoded and there’s an optional post-processing.
Oh, and the codec still outputs only YUV420 8-bit frames.
Bitstream coding
As said above, only static codebooks and integer codes are used. This makes decoding faster and less painful to implement. Elias Gamma codes are used for motion vector differences and static codebooks are used to code coding block patterns and actual coefficients.
Static codebooks are stored in encrypted form—XORed (or EORed) with 32-byte key that contains the name and mail address of the person responsible for them (presumably). The codes are stored as length minus one in each nibble—exactly like in RALF and probably for the same reason: there are 5 intra codebook sets and 7 inter codebook sets with 7 or 23 tables each, some tables having 864 entries. In other words, a lot of information that should be packed somehow.
Coding unit is split into 8×8 or 16×16 clusters that have own CBPs and there’s CBP for such clusters too. The actual coefficient coding is done the same way as it was in RealVideo 4—in 4×4 blocks with sets selected depending whether it’s intra or inter block, if it’s the first cluster of 4×4 blocks in coded unit and quantiser.
Also it’s worth noting that each slice starts with delta for frame quantiser, so the whole slice is coded with the same quantiser but different slices may have different quantisers.
Frame reconstruction
I have not got thoroughly through this part so only some notes for now.
Coding unit types:
- 0 — intra block;
- 1 — inter block with MV;
- 2 — unknown;
- 3 — unknown, probably skip.
Prediction unit types:
- 0 — the block is not split;
- 1 — the block is split 1/2 to 1/2 horizontally;
- 2 — the block is split 1/2 to 1/2 vertically;
- 3 — the block is split into four quarters;
- 4 — the block is split 1/4 to 3/4 horizontally;
- 5 — the block is split 1/4 to 3/4 vertically;
- 6 — the block is split 3/4 to 1/4 horizontally;
- 7 — the block is split 3/4 to 1/4 vertically.
Intra (aka spatial) block prediction can be done either from one of 35 angles or with plane prediction.
Motion compensation uses 1/4-pel interpolation with filter 1, -5, 20, 20, -5, 1
for luma and 1/4-pel linear interpolation for chroma.
There are 4×4, 8×8 and 16×16 transforms that seem to differ from H.EVC. Here’s 8×8 transform matrix for example:
37, 37, 37, 37, 37, 37, 37, 37, 51, 43, 29, 10, -10, -29, -43, -51, 48, 20, -20, -48, -48, -20, 20, 48, 43, -10, -51, -29, 29, 51, 10, -43, 37, -37, -37, 37, 37, -37, -37, 37, 29, -51, 10, 43, -43, -10, 51, -29, 20, -48, 48, -20, -20, 48, -48, 20, 10, -29, 43, -51, 51, -43, 29, -10
Quantisation still uses two quantisers: for DC and AC.
Complete frame reconstruction is obvious: decode CU from the bitstream, if it’s intra then derive intra prediction mode and apply it, otherwise derive motion vector and apply motion compensation. After that dequantise, transform and add decoded coefficients. Do deblocking if requested. Repeat for the rest of coded units. Output the whole frame.
Conclusion
Overall, RealVideo 6 (or 11; or HD) seems to be an evolution of RealVideo 4 with coding approaches ripped off from ITU H.EVC but with a clear goal and trying to keep codec simple enough (i.e. more like Thor and less like AV1). And while knowing the standard won’t hurt, knowing RealVideo 4 would be more useful for REing this codec—at least it was to me.
Having said that, I’m probably not going to have much to say about it until the decoder is ready (hopefully there will be more samples by then too) so this should be it for this year unless I’ll finally write my thoughts on AV1 and why it’s not a thing to be overexcited about.