Archive for April, 2015

NihAV — A New Approach to Multimedia Pt. 5

Saturday, April 25th, 2015

Structures and functions

The problem with structures in libav* is that they are quite often contain a lot of useless information and easily break ABI when someone needs to add yet another crucial field like grandmother’s birthday. My idea to solve some of those problems was adding side data — something that is passed along the main data (e.g. packet) and decoders don’t have to care about it. It would be even better to make it more generic, so you don’t have to care about enums for that either. For instance, most of the codecs don’t have to care about broadcast grade metadata (but some containers and codecs like ATSC A/52 provide a lot of it) or stupid DVD shit (pan&scan anyone?). So if demuxer or decoder wants to provide it — fine, just don’t clutter existing structures with it, add it to metadata and if consumer (encoder/muxer/application) cares it can check whether such non-standard information is present and use it. That’s the general approach I want to have quite similar to FCC certification rule: producers (any code that outputs data) can have any kind of additional data but consumers (code that takes that data for input) do not have to care about it and can ignore it freely. It’s easy to add options marked as essential (like PNG chunks — they are self-marked that you can distinguish chunks that can be ignored from those that should be handled in any case) to ensure that this option won’t be ignored and input handler can error out on not understanding it.

As for proper function calls — Luca has described it quite well here (pity noone reads his blog).

NihAV — A New Approach to Multimedia Pt. 4

Friday, April 24th, 2015

On colourspaces and such

I think current situation with pixel formats is brain-damaged as well. You have a list of pixel formats longer than two arms and yet it’s insufficient for many use cases (e.g. Canopus HQX needs 12-bit YUVA422 but there’s no such format supported and thus 16-bit had to be used instead or ProRes with 8- or 16-bit alpha channel and 10-bit YUV). In this case it’s much better to have pixel format descriptor with all essential properties covered and all exotic stuff (e.g. Bayer to RGB conversion coefficients) in options. Why introduce a dozen IDs for packed raw formats when you can describe them in uniform way (i.e. read it as big/little-endian, use these shifts and masks to extract components etc.)? Even if you need to convert YUV with different subsampling for chroma planes (can happen in JPEG) into some special packed 10-bit RGB format you can simply pass those pixel format descriptors to the library and it will handle it despite encountering such formats for the first time.

P.S. I actually wrote some test code to demonstrate that idea but no-one got interested in it.

NihAV — A New Approach to Multimedia Pt. 3

Friday, April 24th, 2015

More on codecs handling

First of all, people are often AVI-centric and decide that you can always use 4-character code to identify a codec. Well, technically it’s true because there’s significantly less than 4 billion codecs in existence (I hope). The problem is uneven mapping — MPEG containers use integers for codec IDs, AVI uses 4-character code for video and 2-byte integer for audio, MOV uses 4-character code for both audio and video, Matroska uses long strings like V_MPEG4/MS/V3 etc etc. So in any case you have a problem of mapping codecs found by demuxers to internal decoders. In libavcodec it’s handled by having an insane enumeration of codec IDs and I’ve mentioned in part 2 that I’m not a fan of such approach.

So what I suggest instead? A global registry of codec names in string form. And splitting out media information database explicitly. After all, why not provide some codec information even if we cannot support it? Less effort when you add a new decoder and you can query some information about codec even if it’s not supported. Demuxer maps internal ID to codec name (if it can), codec database can be queried about that codec at any time to see what information is known about it and a decoder can be requested for that codec as well.

Here’s an example:

  1. Bink demuxer encounters KB2g;
  2. It reports binkvideo2 decoder;
  3. (optional) From database one can retrieve its name — “Bink Video 2”;
  4. A decoder for binkvideo2 is requested for it but that request is failed because noone has bothered to write such decoder;
  5. Or a decoder implemented by a special plugin that calls TotallyRADVideo.dll is called.

Just replace enum with string and you get better flexibility and only VideoLAN won’t like it.

NihAV — A New Approach to Multimedia Pt. 2

Thursday, April 23rd, 2015

Common design principles

I’d been participating in FFmpeg and then Libav development for about ten years and I’ve touched many parts of its codebase except for libavfilter and libavresample, so I know what I dislike in its design.

Enumerations. Maybe people like them but I think it’s much better to have list of string identifiers instead. You still specify codec or format or protocol by name in command line, why should code have that bulky and incompatible enumeration? It would be more convenient for library user to use string identifier — you try to find format handler for a given name and if you don’t have it or its support is disabled then no luck (of course VideoLAN prefers enums but that’s their problem).

Large pointless structures. AVCodecContext and AVFrame are good examples of that (especially the old versions). They lug around many members that are applicable only to very limited subset of video codecs and nothing else. A much better approach IMO would be to have substructures with minimal information needed for all audio/video/subtitle data (both in frame and context) and the rest is put into dictionary (maybe as subobjects, like motion information or rate control information structures).

API variations. Current approach is to shoehorn everything into specific structure. My opinion is that public functions should take as flexible (or simple) input as possible and do the same with output. For example, why have avcodec_decode_video2(), avcodec_decode_audio4() and avcodec_decode_subtitle2() if single function is enough? You feed input bytes and you obtain output bytes — no matter what you actually do (encode, decode, filter or pass through). Anything optional should be passed as optional — in a dictionary for example.

Various stuff. Parsing, probing, timestamp handling. All these things need to be reinvented because it’s hard to imagine them being much worse than they are or were a couple years ago.

I’d also like to have some small building blocks for codecs. In libavcodec many video decoders were forced to be built around MpegEncContext and noone likes that structure (except one guy who even named a video player after it but then again he doesn’t want to disclose his real name…). I prefer to have more independent decoders reusing the same methods somehow (e.g. this codec needs this frame management, this motion compensation). How to implement it, boost::codec::video::block_decoder templating and macros or function pointers for codec-specific functions (like block decoding) is yet to be conceived.

To be continued eventually…

NihAV — A New Approach to Multimedia Pt. 1

Thursday, April 23rd, 2015

Foreword or Why?!?!

There are two curses in program design (among many others) — legacy and monolithic design.

Legacy means two things: first, there is such thing ask backward compatibility that you (sometimes have to maintain) or the users will complain about broken APIs and ABIs; second, there’s code legacy, i.e. decisions taken in the past that are kept for some reason (e.g. noone understands how it works). Like AVI demuxer in libavformat containing special cases for handling specific files that noone has ever seen.

Monolithic design is yet another problem that creeps into many projects with time. I don’t know why but quite often code gathers itself into intangible chunks and with time those chunks grow and get uglier. Anyone worked with FFmpeg might take a pleasure looking at mpegvideo in libavcodec, libswscale and libpostproc (especially if you look at the versions from about 2010).

So there are two ways how to deal with it — evolution (slowly change interfaces in hope to be better one day, deprecate stuff etc.) and revolution (simply forget it and write a new stuff from scratch).

In this and following posts I’ll describe a new framework (or whatever buzzword applies here) NihAV (Not-Invented-Here Audio-Video). Maybe I’ll even implement it for my own needs and the name should hint how much I care about existing design decisions.

Decompilation Horror

Saturday, April 18th, 2015

In the old days I found PackBits (also DxTory) decoding routine monstrous. That Japanese codec had a single decoding function 349549 bytes long (0x1003DFC00x1009352D) and that was bad style in my opinion.

Well, what do you know? Recently I’ve looked at AMV3 codec. Its encode function is 445048 bytes long (0x10160C200x101CD698). And decode function? 1439210 bytes (0x100011500x1016073A)! I’ve seen many decoders smaller than that function alone.

There’s one thing common to those two codecs that might explain this — both DxTory/PackBits and AMV3 are Japanese codecs. It might be their programming practice (no, it’s not that bad) but remember that other codecs have crappy code too for other reasons. And some of them actually look better in compiled form than in source form (hello there, Ad*be and Micro$oft!). Yet I find it somewhat easier to deal with code that doesn’t frighten IDA (it refuses to show those functions in graph form because of too many nodes and maybe I’ll run decompiler on decode function in autumn – because it will keep my apartment warm till spring).

Vector Quantisation Codecs are Still not (semi-kinda) Dead!

Thursday, April 16th, 2015

While golden days of vector quantisation codecs seem to be over (Cinepak, Smacker and such) there’s still one quite widespread use of vector quantisation in video — texture compression. And, surprisingly, there’s a couple of codecs that employ texture compression methods (good for GPU acceleration, less stuff to invent etc.) like Vidvox Hap or Resolume DXV (which looks suspiciously similar in many aspects but with some features like LZ4, LZF or YCoCg10 compression added). I have not looked that closely at either of them but looks like they still operate on small blocks, it’s just e.g. compressing each plane 8×8 block with BC4 and combining them later.

This does not seem that much interesting to me but I’m sure Vittorio will dig deeper. Good luck to him!

P.S. I forgot — which version of Firefox comes with ORBX.js support?

A Bit about Actimagine Nerve Agent

Sunday, April 12th, 2015

Codecs are sometimes named after really ridiculous things. Actimagine has named it after nerve agent. Or Panasonic VCR tape format that’s only Wickedpedia has heard about. But I bet on nerve agent (if you didn’t have to study chemical warfare weapons at school then you’re not born in the USSR and be thankful for that).

First of all, I don’t know much about VX except that it was used on game consoles. Also judging by the code it was intended to be used with really low resolutions because it had stride hardcoded to 256 from what I’ve seen.

It reminds me of Mobiclip HD somewhat. I’m too lazy to investigate all details (because I only have an ARM disassembly of some binary with some helpful comments) but here’s what I could find after spending an hour or two on it.

Video codec employs exp-Golomb codes for most things — signed and unsigned ones, bitreader limits them to 16 bits at most. Again, there’s not much I can say about overall structure except that it looks like simplified H.264/H.265 (though it was obviously well before H.265 times) — there is spatial prediction (four modes only: horizontal/vertical, DC and plane) and there’s macroblock subdivision too (into square blocks only, possible sizes are 4×4, 8×8 or 16×16). It still looks like there’s one motion vector per macroblock with motion vector deltas for subblocks.

Again, noone cares.

A Short Guide to Julmust/Påskmust

Saturday, April 11th, 2015

Unfortunately I was not able to visit Sweden this Easter season — it was merely 6 days in Stockholm. Yet I’ve managed to try one of the reasons I come to Sweden — påskmust. For those who don’t know what it is — shame on you! For the rest here’s my incomplete and biased guide.

julmust
Some old julmust photo. Left to right: Nygårda, Eldorado (Hemköp), ICA, Coop, Wasa, Apotekarnes. Lying is the Lidl julmust.

img_7247
Some old påskmust photo (probably from 2011). Left to right: Mora, Nygårda, Apotekarnes, ICA, probably Lidl, Eldorado (Hemköp), Coop. Lying are ordinary and special Wasa påskmust. Front bottle is from Guttsta Källa.

IMG_4282
This year catch. Back row: Wasa special, ICA, Apotekarnes. Front row: Nyckelbryggeri, Zeunerts, Grebbestads bryggeri, Mora, Nygarda, Danish abomination.

So one can divide julmust/påskmust into four categories:

  1. Widespread must from large producers or supermarket chains. That includes Apotekarnes, Nygårda and must made for Coop, Hemköp, ICA and Lidl. But not for Netto, see category four for that.
  2. Must from Norrland breweries. Nyckelbryggeri, Wasa and Zeunerts are most known. And maybe Mora.
  3. Must from non-Norrland breweries. Guttsta Källa, Grebbestads, Hammars (I have to try that one yet).
  4. Abominations from people who don’t know how to make proper must. That includes Bjäre must from C*ca-cola, Harboe must from Netto (made in Denmark) and whatever Danish stuff I tried this year. Concentrate for making must at home probably belongs here too.

The taste is hard to describe but it’s really nice and makes me think of liquid bread for some reason. The main difference is Norrland/non-Norrland must. Julmust and påskmust in Norrland style is less sweet and usually has a hint of coffee. Must from large producers is usually sweeter than the rest. Wasa bryggeri produces two kinds of must — special, available only in Norrland and made after Norrland traditions, and ordinary, available in Svealand and with taste closer to the more widespread varieties.

Danish must is either bland or plainly wrong. The one I tried this year is not actually bad, it’s just completely wrong — it contains e.g. raspberry juice and cola extract. If I drink påskmust I want to be påskmust, not a weird mix of Pommac and Trocadero that probably has only water and sugar in common with the other påskmust recipes.

And now is the actual guide. If you want to try it then start with widespread påskmust you can find in any Swedish supermarket, it should be fine. If you like it that way then be happy, if you want something less sweet then try smaller breweries and especially Norrland ones (it’s hard to find it outside Norrland though). And if you are not in season then you can still try something similar — bordsdricka from Wasa or sommarmust from Nyckelbryggeri should be available (in Norrland).

P.S. You can extrapolate it to Trocadero as well except there’s less variation in taste and there’s no supermarket or Danish version.