In late September 2017 I’ve started to work on RealMedia support in NihAV
with an intent to have full support for RealMedia. So more than a year later I’ve reached that goal.
In order to do that I had to reverse engineer one and a half codecs and one format. Here’s the full list.
Supported formats:
- RealAudio (i.e. just single audio stream);
- plain RealMedia (i.e. just a bunch of audio and video streams);
- RealMedia with multiple data chunks (i.e. one or several streams are stored in separate chunk, it’s nothing extraordinary but still needs to be accounted for);
- RealMedia multiple stream variants (i.e. single logical stream is represented by several substreams and you have to select one based on quality);
- IVR, their own recording format (I had to RE this one).
Supported audio codecs:
- RealAudio 1 aka 14.4;
- RealAudio 2 aka 28.8;
- RealAudio (AC)3 aka DNET;
- RealAudio 4/5 (licensed from Sipro);
- RealAudio G2 (cook);
- RealAudio ATRAC3;
- RealAudio AAC-LC (no SBR);
- RealAudio Lossless.
And video codecs:
- RealVideo 1;
- RealVideo Fractal aka ClearVideo (I had to finish REing P-frame format for that one);
- RealVideo 2;
- RealVideo 3;
- RealVideo 4;
- RealVideo 6 or HD (I had to RE this one and now it decodes the sample I have with only minor glitches).
And here are some words about IVR that I had to RE this week.
Update: it turns out Paul had reverse engineered the format before NihAV
came to existence but his implementation is even sketchier than mine unfortunately.
There are actually two formats there, one with magic .REC
that contains actual recording and another one with magic .R1M
that may contain several of those .rec embedded. Both formats internally reminded me more of Flash than RealMedia because both files are composed of records that can be distinguished by the first byte (yes, I still remember RTMP and how I had to parse it). R1M has two kinds of records: 0x01
—recording metadata it seems, 0x02
contains actual REC.
REC files (or sub-entries in R1M) have defined amount of global properties followed by stream specific properties followed by (optional) stream seek tables followed by actual packets. All numbers are big-endian and 32-bit (seek table offsets seem to be 64-bit). Strings are coded as string length (including terminating zero) followed by string data and zero terminator.
REC record types:
- stream properties start, has a number of properties coded after it;
- packet header, more about it below;
- key-number pair, has key value (string), a number property length (always 4) and actual number value;
- binary data, has key value (string), binary data length and actual data;
- key-value pair with both key and value being strings;
- end of header record with three numbers, first of which gives an absolute (from the beginning of REC data if embedded) offset for the seek tables or packets;
- packet data end, always followed by eight zeroes;
- packet data start, always followed by eight zeroes. This record seems to be present only when seek tables are present (to detect the end of those?), otherwise packets follow end-of-header record immediately.
There may be several RJMx chunks at the end of IVR with additional metadata but they posed no interest to me.
I had some trouble with IVR packets since I expected them to be exactly the same as in RM but it turned out to be the same payload format but with different header:
- 32-bit timestamp;
- 16-bit stream number;
- 32 bits of flags. I suspect this might code packet group for
MLTI
substreams, keyframe information and such but I could not find a proper pattern valid for all three samples (and demuxer works fine without it too); - 32-bit payload length;
- 32-bit header checksum (most likely). I was not able to understand how it works but header checksum seems to be the most plausible explanation.
I am fully aware that my current implementation has bugs and flaws and might not decode all files perfectly but it decodes all kinds of files and that’s good for me. Also what to expect from software written by one lazy guy in his free time for himself?
Next is probably Duck type of codecs or totally RAD ones. Or maybe I’ll waste time on making NihAV
conform to Rust 2018 edition. This seems to be a task about half as hard as porting code from K&R C to ANSI C (from a quick glance you have to change at least imports from inside the crate, traits now require word dyn
and there may be more). Or it may be NAScale for all I care (and I don’t care at all). The time will show.
I REed IVR format before you in FFmpeg.
Indeed, I’ve updated my post.