Archive for the ‘NihAV’ Category

NihAV — A New Approach to Multimedia Pt. 3

Friday, April 24th, 2015

More on codecs handling

First of all, people are often AVI-centric and decide that you can always use 4-character code to identify a codec. Well, technically it’s true because there’s significantly less than 4 billion codecs in existence (I hope). The problem is uneven mapping — MPEG containers use integers for codec IDs, AVI uses 4-character code for video and 2-byte integer for audio, MOV uses 4-character code for both audio and video, Matroska uses long strings like V_MPEG4/MS/V3 etc etc. So in any case you have a problem of mapping codecs found by demuxers to internal decoders. In libavcodec it’s handled by having an insane enumeration of codec IDs and I’ve mentioned in part 2 that I’m not a fan of such approach.

So what I suggest instead? A global registry of codec names in string form. And splitting out media information database explicitly. After all, why not provide some codec information even if we cannot support it? Less effort when you add a new decoder and you can query some information about codec even if it’s not supported. Demuxer maps internal ID to codec name (if it can), codec database can be queried about that codec at any time to see what information is known about it and a decoder can be requested for that codec as well.

Here’s an example:

  1. Bink demuxer encounters KB2g;
  2. It reports binkvideo2 decoder;
  3. (optional) From database one can retrieve its name — “Bink Video 2”;
  4. A decoder for binkvideo2 is requested for it but that request is failed because noone has bothered to write such decoder;
  5. Or a decoder implemented by a special plugin that calls TotallyRADVideo.dll is called.

Just replace enum with string and you get better flexibility and only VideoLAN won’t like it.

NihAV — A New Approach to Multimedia Pt. 2

Thursday, April 23rd, 2015

Common design principles

I’d been participating in FFmpeg and then Libav development for about ten years and I’ve touched many parts of its codebase except for libavfilter and libavresample, so I know what I dislike in its design.

Enumerations. Maybe people like them but I think it’s much better to have list of string identifiers instead. You still specify codec or format or protocol by name in command line, why should code have that bulky and incompatible enumeration? It would be more convenient for library user to use string identifier — you try to find format handler for a given name and if you don’t have it or its support is disabled then no luck (of course VideoLAN prefers enums but that’s their problem).

Large pointless structures. AVCodecContext and AVFrame are good examples of that (especially the old versions). They lug around many members that are applicable only to very limited subset of video codecs and nothing else. A much better approach IMO would be to have substructures with minimal information needed for all audio/video/subtitle data (both in frame and context) and the rest is put into dictionary (maybe as subobjects, like motion information or rate control information structures).

API variations. Current approach is to shoehorn everything into specific structure. My opinion is that public functions should take as flexible (or simple) input as possible and do the same with output. For example, why have avcodec_decode_video2(), avcodec_decode_audio4() and avcodec_decode_subtitle2() if single function is enough? You feed input bytes and you obtain output bytes — no matter what you actually do (encode, decode, filter or pass through). Anything optional should be passed as optional — in a dictionary for example.

Various stuff. Parsing, probing, timestamp handling. All these things need to be reinvented because it’s hard to imagine them being much worse than they are or were a couple years ago.

I’d also like to have some small building blocks for codecs. In libavcodec many video decoders were forced to be built around MpegEncContext and noone likes that structure (except one guy who even named a video player after it but then again he doesn’t want to disclose his real name…). I prefer to have more independent decoders reusing the same methods somehow (e.g. this codec needs this frame management, this motion compensation). How to implement it, boost::codec::video::block_decoder templating and macros or function pointers for codec-specific functions (like block decoding) is yet to be conceived.

To be continued eventually…

NihAV — A New Approach to Multimedia Pt. 1

Thursday, April 23rd, 2015

Foreword or Why?!?!

There are two curses in program design (among many others) — legacy and monolithic design.

Legacy means two things: first, there is such thing ask backward compatibility that you (sometimes have to maintain) or the users will complain about broken APIs and ABIs; second, there’s code legacy, i.e. decisions taken in the past that are kept for some reason (e.g. noone understands how it works). Like AVI demuxer in libavformat containing special cases for handling specific files that noone has ever seen.

Monolithic design is yet another problem that creeps into many projects with time. I don’t know why but quite often code gathers itself into intangible chunks and with time those chunks grow and get uglier. Anyone worked with FFmpeg might take a pleasure looking at mpegvideo in libavcodec, libswscale and libpostproc (especially if you look at the versions from about 2010).

So there are two ways how to deal with it — evolution (slowly change interfaces in hope to be better one day, deprecate stuff etc.) and revolution (simply forget it and write a new stuff from scratch).

In this and following posts I’ll describe a new framework (or whatever buzzword applies here) NihAV (Not-Invented-Here Audio-Video). Maybe I’ll even implement it for my own needs and the name should hint how much I care about existing design decisions.