October 3rd, 2015

Since I’ve been asked the same questions over and over again I’ve decided to make a short (for now) FAQ page.

  • How many years does it take to get a citizenship in Germany? 7-8 years.
  • How long have you been living in Germany? Since Spring 2010, do the math yourself.
  • So you’ll get your German citizenship in a couple of years, right? Maybe. It’s the same kind of maybe as in ‘Berlin-Brandenburg airport will be open in a couple of years.’ And it does not depend on me much.
  • Can you help me with ProRes issue … I can but I have no desire nor obligations. All Trocadero I got writing an encoder is gone long time ago and I don’t participate in projects that offer any ProRes support, inquire there.
  • Can you look at this codec … I can but no promises — I rarely have a desire to do anything these days.
  • Is NihAV real? More or less, it still lacks a lot of design and code but there are some bits implemented already. Design is described in this blog when it appears, code is developed as who-cares-source.
  • Why do you blame lu_zero? Oh, there are so many reasons for that and new ones keep appearing almost every day. Mostly it’s for the things he was supposed to do but still hasn’t done (and unlikely to do in foreseeable future): AVScale design and implementation, writing blog posts on certain topics (often I end writing them, which is yet another reason to blame him), not doing much about ASF or RealMedia demuxers and related delayed work, for personal stuff (like preventing me trying Torino trams and underground), for missing technical stuff in a wiki. Oh, and for being at least two different persons. There’s more that I can’t remember right now.
  • When will you visit Pelhřimov? Dunno, maybe when I have more than three free days.

NihAV: Data Delivery Channels

September 27th, 2015

Disclaimer: this should’ve been written by a certain Italian (and discussed during VDD too) but since he is even lazier than me (and stopped on writing only this), I ended writing this.

One of the main concerns in framework design is how data will be passed from sources to destinations. There are two ways that are mistakenly called synchronous and asynchronous, the main difference being lack of CLK signal how the producing function passes data further — on return when it’s done whatever it wanted to do or by passing control to some external procedure immediately when data is available.

In the worst case of synchronous data passing you have a fixed pipeline: one unit of input goes in, one unit of output is expected to be produced (and maybe one more on final flush), repeat for the next stage. In the worst case of asynchronous data passing you have one thread per stage waiting for another thread to signal that it has some data ready to process. Or in single-threaded mode you simply have nested callbacks, again one per stage, calling the next callback in a chain when it has something to deliver.

Anyway, how should this all apply to NihAV?

In the simplest case we have one chain:

[Demuxer] --> (optional stream splitter) --> Packets --> [Decoder] --> Frames --> [output]

In real world scenario all complexity starts at the stage marked [output]. There you usually feed frames to some filter chain, then to encoder and combine several streams to push all that stuff to some muxer(s).

As for data passing, you can have it in several modes: main processing loop, synchronous processing graph, asynchronous processing graph or an intricate maze of callbacks all alike.

With the main processing loop you have something like this:

while (has_input()) {
   pkt = get_input();
   while(has_output()) {
while(has_output()) {

It’s ideal if you like micromanagement and want to save on memory but with many stages it might get rather hairy.

The synchronous processing graph (if you think it’s not called so look at the project name again) adds queues and operates on stages:

while(can_process(graph)) {
   for (i = 0; i < num_stages; i++) {      graph->stages[i]->process(graph->inqueues[i], graph->outqueues[i]);

In this situation you have not a single [input] -> [output] connection but rather [input] -> (queue) -> [output], every stage is connected by some queues with another stage and can consume/produce several (or none) elements in any of the queues.

The asynchronous processing graph relies on the output stages pulling data from all previous stages or input stages pushing data to the following stages.

int filter_process(Filter *f, void *data)
  do_something(f->context, data);
  f->next->filter(f->next, data);

And callback is:

int demux(void (*consumer)(...)){
   while (!eof) {
     if (has_packet)
       consumer(ctx->consumer_id, packet);
   return 0;

Callbacks are nice but they move a lot of burden into callback writer and they are hard to get right especially if you do something nontrivial.

So far I’m inclined to have the second approach. E.g. Bink demuxer has packets for several streams stored together in one block — so demuxer will create them all and put into its output queue. Hardware accelerated codec may send several frames for decoding at once — so it will read as many as possible or needed from the input queue and add all decoded frames into output queue when they’re done. Also all shitty tasks like proper synchronisation can be made into filters too.

Of course that will create several new problems especially with resource management but they should be solvable.

For NihAV implementation that means adding NAQueue and several new datatypes (packet, frame, whatever) plus a mechanism in NAClass to recognize them because you would not want a wrong thing going to the wrong pipeline (e.g. decoder expects NAPacket and gets NAFrame instead).

NihAV — I/O

September 3rd, 2015

And now let’s talk about probably the hairiest part of multimedia framework — input-output layer.

Current libavformat design looks too messy to me and thus my design will be different. I know that some Lucas prefer the old way but I’d rather split protocol handling into several layers.

First, there’s base I/O handler used solely for I/O operations. NAIOHandler contains functions for reading data, reading data asynchronously, writing data, seeking and ioctl() for some I/O-specific operations. There will be very few I/O handlers — just file, network (TCP/UDP) and null.

On top of that there’s protocol handler. Protocol handler employs I/O handler to perform I/O operations and provides only reading, writing, seeking and flushing functions. There may be buffered and unbuffered wrappers over I/O handler or something less trivial like HTTP handler.

On top of that there may be a layer or several of other protocol handlers if they need to relate to other protocols (i.e. Some-Streaming-Protocol-over-HTTP).

And on the very top there are I/O functions using protocol handlers for reading and writing data (bytes, X-bit integers, buffers). Those are used by (de)muxers.

That’s the plan, insert usual rant about lack of time, interest and such here.

Random Thoughts on Format Design Process

August 12th, 2015

From my experience a lot of codecs have some wrong things in them and those things are usually introduced during codec creation. As for containers, I’ve expressed my opinion before.

It is very bad when some codec is being developed and then suddenly it’s declared released. You’re left with a pile of code that has somehow evolved into current shape and probably even the author has no idea how it works. Two examples — Snow and Speex. The first one is wavelet-based codec that performed quite well back in DiVX 3/4 days, the other one is a speech codec that also gained some popularity and was even included as one of Flash audio codecs. So the codecs by themselves should not be that bad but there’s only one implementation and no specification. There were several attempts to make Snow developer write a specification for it (for money!) but he always refused. FFV1 is faring somewhat better since it has some rudimentary specification and hopefully standardisation efforts will bring us independent implementations and full specification (yes, I’m an idiot optimist). What would be a proper way to design a codec in my opinion? Create test version, play with it till you achieve good result or release a known beta, write specification, throw away old code and reimplement version 1 from scratch. Repeat for version 2 etc.

I think I’ve complained before that this situation is very common with proprietary codecs too. They have inhouse encoder and decoder implementation with encoder bugs compensated in the decoder. Stupid motion compensation in RealVideo 4 is one of those “features”. Or pre-RTM WMV9 with its block pattern coding though it’s supposed to be beta anyway.

There is even worse case — when codec author decides to embed all development history into decoder maintaining backward compatibility. The worst offender is Monkey Audio with its subtle bitstream changes at every version and having two dozen versions. Another “good” example is HEVC with its ever-changing bitstream format. Different major versions of reference software introduce serious bitstream changes, like HM8 -> HM10 transition remapped all NAL IDs. IIRC superseded version of ITU H.265 was for 4:2:0 subsampling only. Honestly, I shan’t cry if this codec dies because of idiotic licensing terms (and maybe it should really be contained only in FLV). Speaking of HEVC idiocies, VP9 got new features in new profiles including 4:4:0 subsampling. In my opinion one should kill this creeping featurism especially if you don’t have proper profiling/versioning system and even them introduce new features sparingly.

At least there’s still hope for Daala to be developed properly.

NihAV — Guidelines

August 8th, 2015

The weather here remains hellish so there’s little I can do besides suffering from it.

And yet I’ve spent two hours this morning to implement the main part of NihAV — set implementation. The other crucial part is options handling but I’ll postpone it for later since I can write proof of concept code without it.

Here’s a list of NihAV design guidelines I plan to follow:

  • Naming: component functions should be called na_component_create(), na_component_destroy(), na_component_do_something().
  • More generally, prefixing: public functions are prefixed with na_, public headers start with na as well e.g. libnadata/naset.h. The rest is private to the library. Even other NihAV libraries should use only public interfaces, otherwise you get ff_something and avpriv_that called from outside and in result you have MPlayer.
  • NihAV-specific error codes to be used everywhere. Having AVERROR(EWHATEVER) and AVERROR_WHATEVER together is ridiculous. Especially when you have to deal with some error codes being missing from some platform and other being nonportable (what if on nihOS they decided to map ENOMEM to 42? Can you trust error code returned by service run on some remote platform then?).

And here’s how actual set interface looks like:

  1. #ifndef NA_DATA_SET_H
  2. #define NA_DATA_SET_H
  4. #include "nacommon.h"
  6. struct NASet *na_set_create(NALibrary *lib);
  7. void          na_set_destroy(struct NASet *set);
  8. int           na_set_add(struct NASet *set, const char *key, void *data);
  9. void*         na_set_get(struct NASet *set, const char *key);
  10. void          na_set_del(struct NASet *set, const char *key);
  12. struct NASetIterator;
  14. typedef struct NASetEntry {
  15.     const char *key;
  16.     void *data;
  17. } NASetEntry;
  19. struct NASetIterator* na_set_iterator_get(struct NASet *set);
  20. void na_set_iterator_destroy(struct NASetIterator* it);
  21. int na_set_iterator_next(struct NASetIterator* it, NASetEntry *entry);
  22. void na_set_iterator_reset(struct NASetIterator* it);
  24. #endif

As you can see, it’s nothing special, just basic set (well, it’s really dictionary but NIH terminology applies to this project) manipulation functions plus an iterator to scan through it — quite useful for e.g. showing all options or invoking all registered parsers. Implementation wise it’s simple hash table with djb2 hash.

Predicting NGV^W RMHD

July 19th, 2015

Here’s an occasional prediction how RMHD should look like knowing nothing about it beside press release claims.

  • Base standard — for RV 1 and 2 it was H.263, for RV 3 and 4 it was H.264. Obviously, RMHD and RMUHD should be based on H.265;
  • MV precision — RV 2 had ½-pel MV, RV 3 had ⅓-pel MV, RV 4 had ¼-pel MV. Obviously, RMHD will have ⅕-pel MV. Or still ¼-pel because H.265 has not improved MV precision compared to H.264;
  • Bitstream coding — usually that one is kept from previous generation of ripoff codec. Thus, H.265 keeps decoding VLCs further compressed with CABAC, AVS2 (aka HEVS) keeps doing the same with its own coder, VPx using range coder from VP<x-1> and static probabilities Huffman codes. RMHD is supposed to have context-dependent Huffman tables with some bitcoder following it. I.e. determine bitcode from element neighbours and then code each bit of it using some context-adaptive coder (and add some context-dependency somewhere too).
  • Special features — probably none, it will just follow the standard in codec design and the main difference will be in coefficients coding. There’s a chance they’ll build in some scalability feature though.

Let’s live and see what RMHD will really be. It would be nice if none of these predictions us correct.

Springtime for H.265 clones!

July 15th, 2015

Previously I feared there won’t be any H.265 clones beside VP<git-experimental> codec but luckily I was proved wrong.

There’s the second announcement of Really?Networks RMHD, intended for China (RealMedia was popular there after all). Either it’s their completely new codec (NGV) that has finally buffered 100% based on some original ideas or it’s H.265 ripoff. I’d bet on the latter.

Second, I’ve finally read a book describing upcoming AVS2 (again, intended for China and being a Chinese standard). Well, if the first paragraph describing it has such abbreviations as CU, PU and TU you may be sure it’s an original codec that has nothing to do with H.265. Coding concepts like variable block transform, splitting motion compensating block unevenly and having 34 intra prediction modes — those concepts are completely original and are not used anywhere else for sure. Of course there’s some Chinese logic involved in some decisions and thus codec has such gems ripped off HEVC like coding motion vectors in integer precision instead of quarterpel if they exceed certain limit or coding coefficients in zigzags of 4×4 blocks or having special treating for 64×64 blocks (this block is downscaled first and then transformed with conventional 32×32 transform — and they call it Logical Transform BTW) or special motion vector prediction mode for F-frames.

But that’s not all — they’ve introduced special “scene coding”. It relies on G-frames or GB-frames that contain scene background and it may be not displayed (who said VPx?!), and S-frames contain foreground motion. Though I’m pretty sure one can emulate it using H.265 features too, maybe longrefs plus no_display flag. I’m also pretty sure that if HEVC lacks some coding approach for now it will be added soon as a special extension (at least what I’ve read in screen coding extension looked completely logical — like a saddle as one of car seats).

Now I can be sure at last that codec future is looking good.

UPD: And there’s Cisco Thor now as well (simplified HEVC with VLC instead of CABAC). It does two things simultaneously — expands H.265 ripoffs family and borrows more from H.264. Now the only thing missing is Sorenson SVQ5 (or Double Spark or whatever name they want to give it).

On Greece

July 12th, 2015

I see too much bullshit about Greece in Internet these days, so much of it that I could not refrain from writing this post.

First of all, I come from a country with even worse economical situation (fun fact — the former Ukrainian ostrich supportedpresident complained how hard it’s to repay debts on his visit to Greece during the first Greek debt crisis). Unlike Greece most of people got no money from government, companies had large tax burden (in the latter years the government decided to press companies to pay taxes in advance and in amount decided by the tax inspection, tax returns working only for selected companies), lots of debts that went to no good purpose…

But enough about similarities between countries (certain Italians are not happy about similarities between Ukraine and Italy either), let’s get to the bullshit statements.

It’s not their fault. Of course it is, they had to forge their financial statistics under gunpoint in order to join and remain in Eurozone. Of course they share blame with Eurobureaucracy that wanted to extend EU even with a Greece and was willing to overlook their faults in order to keep it. Yet active part had been done by Greek government — it’s easy to buy voters with borrowed money that somebody else has to return in the future (in other words — not our problem). Another point of tension is Schengen area membership: because of good border control they have a lot of illegal immigrants and that’s what EU needs, hopefully when some neighbouring lands will connect Greece to the rest of Schengen area it will bring joy to everyone, especially to the UK.

The whole world is in debt to Greece for their achievements in culture and science. First of all, that sounds like typical copyright. “My grandfather once wrote a song that was played on a radio, I deserve not to work ever in my life.” (some Slashdot comment as I remember it from a decade ago or so). Second, most of the current countries have nothing to do with the nations that were on that territory a thousand or two thousand years ago. Look at Arab Republic Egypt — there was nothing Arabic in the people who built pyramids, temples and sphinxes. If you believe David Ben-Gurion’s thesis, then Palestinians are true Israeli people who lost their culture because of Arab conquests — they seem to oppose their original religion even to this day. Same story with Balkan nations and Ottoman Empire: modern Greece has nothing to do with the ancient Greece except in territory (say hello to Macedonia) and similar language. So, nice knowing you but don’t claim the old history to yourself; and while I’m grateful for those past achievements, they are not yours. I’d been living in a country that tried to exploit that (mostly in form of Soviet legacy and what colloquial “they” did for everyone), no thanks.

NihAV: core

June 14th, 2015

Here’s how the main NihAV header looks and it should remain the same (maybe I’ll add error codes there as well but that’s it):

  1. #ifndef NA_COMMON_H
  2. #define NA_COMMON_H
  4. #include <stddef .h>
  5. #include <stdint .h>
  7. struct NASet;
  9. enum NAOptionType {
  10.     NA_OPT_NULL = 0,
  11.     NA_OPT_FLAGS,
  12.     NA_OPT_INT,
  13.     NA_OPT_DOUBLE,
  14.     NA_OPT_STRING,
  15.     NA_OPT_BINARY,
  16.     NA_OPT_POINTER,
  17. };
  19. typedef union NAOptionValue {
  20.     int64_t     i64;
  21.     uint64_t    u64;
  22.     double      dbl;
  23.     const char *str;
  24.     struct bin {
  25.         const char *ptr;
  26.         size_t      size;
  27.     } bin;
  28.     const void *ptr;
  29. } NAOptionValue;
  31. typedef struct NAOption {
  32.     const char        *name;
  33.     enum NAOptionType  type;
  34.     NAOptionValue      value;
  35. } NAOption;
  37. enum NAOptionInterfaceType {
  38.     NA_OPT_IF_ANY,
  39.     NA_OPT_IF_MINMAX,
  41. };
  43. typedef struct NAOptionInterface {
  44.     const char        *name;
  45.     const char        *explanation;
  46.     enum NAOptionType  type;
  47.     enum NAOptionInterfaceType if_type;
  48.     NAOptionValue      min_val, max_val;
  49.     NAOptionValue     *enums;
  50. } NAOptionInterface;
  52. typedef struct NALibrary {
  53.     void* (*malloc)(size_t size);
  54.     void* (*realloc)(void *ptr, size_t new_size);
  55.     void  (*free)(void *ptr);
  57.     struct NASet *components;
  58. } NALibrary;
  60. #define NA_CLASS_MAGIC 0x11AC1A55
  62. typedef struct NAClass {
  63.     uint32_t                 magic;
  64.     const char              *name;
  65.     const NAOptionInterface *opt_if;
  66.     struct NASet            *options;
  67.     NALibrary               *library;
  69.     void                   (*cleanup)(NAClass *c);
  70. } NAClass;
  72. void na_init_library(NALibrary *lib);
  73. void na_init_library_custom_alloc(NALibrary *lib,
  74.                                   void* (*new_malloc)(size_t size),
  75.                                   void* (*new_realloc)(void *ptr, size_t new_size),
  76.                                   void  (*new_free)(void *ptr));
  77. int  na_lib_add_component(NALibrary *lib, const char *cname, void *component);
  78. void *na_lib_query_component(NALibrary *lib, const char *cname);
  79. void na_clean_library(NALibrary *lib);
  81. int na_class_set_option(NAClass *c, NAOption *opt);
  82. const NAOption* na_class_query_option(NAClass *c, const char *name);
  83. void na_class_unset_option(NAClass *c, const char *name);
  84. void na_class_destroy(NAClass *c);
  86. #endif

So what we have here is essentially three main entities NihAV will use for everything: NALibrary, NAClass and NAOption.

NALibrary is the core that manages the rest. As you can see it has a collection of components that, as discussed in the previous post, will contain the set of instances implementing tasks (e.g. codecs, de/compressors, hashes, de/muxers etc.) and this library object also contains allocator for memory management. This way it can be all pinned to the needed instance, e.g. once I’ve seen a code that had used libavcodec in two separate modules — for video and audio of course — and those two modules didn’t know a thing about each other (and were dynamically loaded too). Note to self: implement filtered loading for components, e.g. when initialising libnacodec only audio decoders will be registered or when initialising libnacompr only decoders are registered etc. etc.

The second component is NAClass. Every public component of NihAV beside NALibrary will be an instance of NAClass. Users are not supposed to construct one themselves, there will be utility functions for doing that behind the scenes (after all, you don’t need this object directly, you need a component in NALibrary doing what you want).

And the third component is what makes it more extensible without adding many public fields — NAOption for storing parameters in a class and NAOptionInterface for defining what options that class accepts.

Expected flow is like this:

  1. NALibrary instance is created;
  2. needed compontents are registered there (by creating copies inside the library tied to it — see the next to last field in NAClass);
  3. when an instance is queried, a copy is created for that operation (the definition is quite small and you should not do it often so it should not be a complete murder);
  4. user sets the options on the obtained instance;
  5. user uses aforementioned instance to do work (coding/decoding, muxing, whatever);
  6. user invokes destructor for the instance;
  7. NALibrary instance is destroyed.

There will be some exceptions, i.e. probing should be done stateless by simply walking over the set of probe objects and invoking probe() there without creating a new instances. And something similar for decoder detection too — current libavcodec way with registering and initialising all decoders is an overkill.

This is how it should be. Volunteers to implement? None? Not even myself?! Yes, I thought so.

NihAV: base

June 4th, 2015

As you might have noticed, NihAV development is not going very fast (or at all — thanks to certain people and companies (where I’d never worked and have no desire to work at) that made me lost a desire to program anything) but at least I think somewhat on NihAV design.

So, here’s how the base should look:

   -> <named collection of NihAV components>
     -> NAClass instance that does some task

So, first you create NALibrary that is used to hold everything else. The main content of this library is a set of named collections corresponding to the tasks (e.g. “io” for I/O handlers, “demux” for demuxers, “compr” for compressors etc. etc.). Each collection holds objects based on NAClass that do some specific task — implement file or network I/O, demux AVI or Bink, compress into deflate format etc. All of this is supposed to be hidden from the final user though — it’s NihAV libraries that do all the interaction with NALibrary, they know their collection name and what type of components is stored there. So when you ask for ASF demuxer, the function na_demuxer_find() will access "demux" collection in the provided NALibrary and then it will try to find a demuxer with name "asf" there. NAClass provides common interface for object manipulation — name querying, options setting, etc.

And a word about demuxers — the more I think about it the more I’m convinced that they should output both packets and streams. This is not just for user inconvenience, it also helps chaining demuxers (nothing prevents people from putting raw DV into ASF and then muxing that into MOV with ASF packets containing DV packets — nothing except common sense but it’s too rare to rely upon).

Morale: if you want to implement multimedia framework start with hash table implementation.

P.S. As for implementation language I’ll still stick to C. Newer programming languages like Rust or Swift or that one with retarded gopher have the problem of being not well-widespread, i.e. what if I’m using somewhat outdated Ubuntu or Debian — especially on ARM — where I don’t want to mess with compiler (cross)compilation? Plus it’s likely I’ll make mistakes that will be hard for me to debug and constructions to work around (I doubt modern languages like passing void* on public interface that’s cast to something else inside the function implementation). Of course it’s a matter of experience but I’d rather train on something smaller scale first for a new language.