Three Problems in Supporting Multimedia Formats

As you probably don’t care, I’m working on RealMedia demuxer for NihAV. And it’s very straightforward: chunks without nesting, version field to guard against surprises, clean layout. The only peculiarities it has are audio data interleavers and so-called logical stream (the special entry that describes how to select streams for streaming depending on bitrate available). And yet the implementation for this format in libavformat is quite complex and baffling. This observation led me to playing Captain Obvious and stating these three problems:

  1. Following specification. Unless it’s ISO/IEC or ITU codec you usually have quite lacking specification either with details omitted (or as DT$ representatives put it, but we have it in the SDK!. Which helps a lot when you can download only ETSI paper). In some case the original implementation is the only specification you have. I’m no stranger to working with binary specifications but it still quite often doesn’t say what to expect in some cases (and then fuzzing happens…);
  2. Supporting hacks and abuses of specification. Two examples: MP3 and AVI. Or MP3 in AVI. For instance, MP3 has an optional CRC field (so if you don’t want CRC you simply don’t put it there) but I’ve seen samples that put zero checksum instead. Or in AVI you’re supposed to have 42db chunks for uncompressed video frames, 42dc for compressed video frames and 42wb for audio data. In reality you can have dc and db identifiers mixed in the same stream or even 0041 chunks put inside LIST rec chunk (that’s Indeo 4.1 in CivII clips). And of course there are many many more examples that everybody who has encountered them tries to forget.
  3. Seeking. You might wonder how seeking gets here and I’ll tell you how: most formats are not designed for random seeking and even if they are, users would still want to ignore indices, jump to a random position and find a start of the next chunk and timestamp as well. And in libavformat that is performed by a binary search that invokes special read_timestamp function of the demuxer (if present) which is supposed to do exactly that—searching for packet start and reporting a timestamp.

The moral of the story: if you can allow to ignore stupid user requests, do so and cherish the fact that you can. In NihAV I’m going to implement seeking only for formats that allow that (with reading index) or by more generic linear seeking that skips frames. This should be enough for my needs and it’ll keep code simple too.

Now that it’s become a bit colder I might actually resume my work on NihAV and even more important thing—describing Dingo Pictures art style and works.

Comments are closed.