Archive for the ‘NihAV’ Category

NihAV — NAScale

Saturday, November 21st, 2015

First, some history. If you don’t like reading about it just skip to the ruler below.

So, NAScale is born after yet another attempt to design a good colourspace conversion and scaling library. Long time ago FFmpeg didn’t have any proper solution for that and used rather rudimentary imgconvert; later it was replaced with libswscale lifted from MPlayer. Unfortunately it was designed for rather specific player tasks (mostly converting YUV to RGB for displaying with X11 DGA driver) rather than generic utility library and some of its original design still shows to this day. Actually, libswscale should have a warm place in every true FFmpeg developer’s heart next to MPEGEncContext. Still, while being far from ideal it has SIMD optimisations and it works, so it’s still being used.

And yet some people unsatisfied with it decided to write a replacement from scratch. Originally AVScale (a Libav™ project) was supposed to be designed during coding sprint in Summer 2014. And of course nothing substantial came out of it.

Then I proposed my vision how it should work and even wrote a proof of concept (i.e. throwaway) code to demonstrate it back in Autumn 2014. I’d made an update to it in March 2015 to show how to work with high bitdepth formats but nobody has touched it since then (and hardly before that too). Thus I’m reusing that failing effort as NAScale for NihAV.


And now about the NAScale design.

The main guiding rule was: “see libswscale? Don’t do it like this.

First, I really hate long enums and dislike API/ABI breaks. So NAScale should have stable interface and no enumeration of known pixel formats. What should it have instead? Pixel format description that should be good enough to make NAScale convert even formats it had no idea about (like BARG5156 to YUV412).

So what should such description have? Colourspace information (RGB/YUV/XYZ/whatever, gamma, transfer function etc), size of whole packed pixel where applicable (i.e. not for planar formats) and individual component information. That component information includes information on how to find and/or extract such component (i.e. on which plane it is located, what shift and mask is needed to extract it from packed bitfield, how many bytes to skip to find the first and next component etc.) and subsampling information. The names chosen for those descriptors were Formaton and Chromaton (for rather obvious reasons).

Second, the way NAScale processes data. As I remember it libswscale converted input into YUV with fixed precision before scaling and then back into destination format unless it was common case format conversion without scaling (and then some bypass hacks were employed like plane repacking function and such).

NAScale prefers to build filter chain in stages. Each stage has either one function processing all components or a function processing only one component applied to each component — that allows you to execute e.g. scaling in parallel. It also allows to build proper conversion+scaling process without horrible hacks. Each stage might have its own temporary buffers that will be used for output (and fed to the next stage).

You need to convert XYZ to YUV? First you unpack XYZ into planar RGB (stage 1), then scale it (stage 2) and then convert it to YUV (stage 3). NAScale constructs chain by searching for kernels that can do the work (e.g. convert input into some intermediate format or pack planes into output format), provides that kernel with a Formaton and dimensions and that kernels sets stage processing functions. For example, the first stage of RGB to YUV is unpacking RGB data, thus NAScale searches for the kernel called rgbunp, which sets stage processing function and allocated RGB plane buffers, then the kernel called rgb2yuv will convert and pack RGB data from the planes into YUV.

And last, implementation. I’ve written some sample code that would be able to take RGB input (high bitdepth was supported too), scale it if needed and pack back into RGB or convert into YUV depending on what was requested. So test program converted raw r210 frame into r10k or input PPM into PPM or PGMYUV with scaling. I think it’s enough to demonstrate how the concept works. Sadly nobody has picked this work (and you know whom I blame for that; oh, and koda — he wanted to be mentioned too).

NihAV — Processing Graph Notes

Saturday, October 10th, 2015

I’m giving only a short overview for now, more to come later.

Basically, you have NAGraph that connects different workers (processing units with queues).
Lock-free NAQueue should accept only objects of certain type (side note: introduce libnaarch/naatomics.h).
Also those objects should have a common base (NAGraphObject) that extends NAClass by adding side data and signaling subtype (NAPacket, NARawData or NAFrame for now).
Multiple object types with a common base allow to have the same processing interface (after all, both encoders and decoders simply take some input data and output something else).
Elementary stream from demuxers can be either fed to parser filter that will produce proper packets or it will be directly accepted by decoder or muxer.
Later I should make those parser filters autoinserted too.
Uniform interface should allow easier integration of third-party components even in binary form (if there’s somebody willing to use not yet existing library like this).


Zonal partitioning of the graph (inputs, processing, outputs) maybe should include generic filters (e.g. de/noise) too, and maybe set a flag on NAFrame if it can be modified or should be kept intact.
Errors should be caught right at graph processing stage (do that with callbacks and have fun).
Roughly, this should be the architecture for NihAV till the end of its days since it should be future proof.

Of course, all things described here should be implemented too eventually (sigh).

NihAV: Data Delivery Channels

Sunday, September 27th, 2015

Disclaimer: this should’ve been written by a certain Italian (and discussed during VDD too) but since he is even lazier than me (and stopped on writing only this), I ended writing this.

One of the main concerns in framework design is how data will be passed from sources to destinations. There are two ways that are mistakenly called synchronous and asynchronous, the main difference being lack of CLK signal how the producing function passes data further — on return when it’s done whatever it wanted to do or by passing control to some external procedure immediately when data is available.

In the worst case of synchronous data passing you have a fixed pipeline: one unit of input goes in, one unit of output is expected to be produced (and maybe one more on final flush), repeat for the next stage. In the worst case of asynchronous data passing you have one thread per stage waiting for another thread to signal that it has some data ready to process. Or in single-threaded mode you simply have nested callbacks, again one per stage, calling the next callback in a chain when it has something to deliver.

Anyway, how should this all apply to NihAV?

In the simplest case we have one chain:

[Demuxer] --> (optional stream splitter) --> Packets --> [Decoder] --> Frames --> [output]

In real world scenario all complexity starts at the stage marked [output]. There you usually feed frames to some filter chain, then to encoder and combine several streams to push all that stuff to some muxer(s).

As for data passing, you can have it in several modes: main processing loop, synchronous processing graph, asynchronous processing graph or an intricate maze of callbacks all alike.

With the main processing loop you have something like this:

while (has_input()) {
   pkt = get_input();
   send_output(pkt);
   while(has_output()) {
     write_output();
   }
}
while(has_output()) {
   write_output();
}

It’s ideal if you like micromanagement and want to save on memory but with many stages it might get rather hairy.

The synchronous processing graph (if you think it’s not called so look at the project name again) adds queues and operates on stages:

while(can_process(graph)) {
   for (i = 0; i < num_stages; i++) {      graph->stages[i]->process(graph->inqueues[i], graph->outqueues[i]);
   }
}

In this situation you have not a single [input] -> [output] connection but rather [input] -> (queue) -> [output], every stage is connected by some queues with another stage and can consume/produce several (or none) elements in any of the queues.

The asynchronous processing graph relies on the output stages pulling data from all previous stages or input stages pushing data to the following stages.

int filter_process(Filter *f, void *data)
{
  do_something(f->context, data);
  f->next->filter(f->next, data);
}

And callback is:

int demux(void (*consumer)(...)){
   while (!eof) {
     read_something();
     if (has_packet)
       consumer(ctx->consumer_id, packet);
   }
   return 0;
}

Callbacks are nice but they move a lot of burden into callback writer and they are hard to get right especially if you do something nontrivial.

So far I’m inclined to have the second approach. E.g. Bink demuxer has packets for several streams stored together in one block — so demuxer will create them all and put into its output queue. Hardware accelerated codec may send several frames for decoding at once — so it will read as many as possible or needed from the input queue and add all decoded frames into output queue when they’re done. Also all shitty tasks like proper synchronisation can be made into filters too.

Of course that will create several new problems especially with resource management but they should be solvable.

For NihAV implementation that means adding NAQueue and several new datatypes (packet, frame, whatever) plus a mechanism in NAClass to recognize them because you would not want a wrong thing going to the wrong pipeline (e.g. decoder expects NAPacket and gets NAFrame instead).

NihAV — I/O

Thursday, September 3rd, 2015

And now let’s talk about probably the hairiest part of multimedia framework — input-output layer.

Current libavformat design looks too messy to me and thus my design will be different. I know that some Lucas prefer the old way but I’d rather split protocol handling into several layers.

First, there’s base I/O handler used solely for I/O operations. NAIOHandler contains functions for reading data, reading data asynchronously, writing data, seeking and ioctl() for some I/O-specific operations. There will be very few I/O handlers — just file, network (TCP/UDP) and null.

On top of that there’s protocol handler. Protocol handler employs I/O handler to perform I/O operations and provides only reading, writing, seeking and flushing functions. There may be buffered and unbuffered wrappers over I/O handler or something less trivial like HTTP handler.

On top of that there may be a layer or several of other protocol handlers if they need to relate to other protocols (i.e. Some-Streaming-Protocol-over-HTTP).

And on the very top there are I/O functions using protocol handlers for reading and writing data (bytes, X-bit integers, buffers). Those are used by (de)muxers.

That’s the plan, insert usual rant about lack of time, interest and such here.

NihAV — Guidelines

Saturday, August 8th, 2015

The weather here remains hellish so there’s little I can do besides suffering from it.

And yet I’ve spent two hours this morning to implement the main part of NihAV — set implementation. The other crucial part is options handling but I’ll postpone it for later since I can write proof of concept code without it.

Here’s a list of NihAV design guidelines I plan to follow:

  • Naming: component functions should be called na_component_create(), na_component_destroy(), na_component_do_something().
  • More generally, prefixing: public functions are prefixed with na_, public headers start with na as well e.g. libnadata/naset.h. The rest is private to the library. Even other NihAV libraries should use only public interfaces, otherwise you get ff_something and avpriv_that called from outside and in result you have MPlayer.
  • NihAV-specific error codes to be used everywhere. Having AVERROR(EWHATEVER) and AVERROR_WHATEVER together is ridiculous. Especially when you have to deal with some error codes being missing from some platform and other being nonportable (what if on nihOS they decided to map ENOMEM to 42? Can you trust error code returned by service run on some remote platform then?).

And here’s how actual set interface looks like:
[sourcecode language=”c”]
#ifndef NA_DATA_SET_H
#define NA_DATA_SET_H

#include “nacommon.h”

struct NASet *na_set_create(NALibrary *lib);
void na_set_destroy(struct NASet *set);
int na_set_add(struct NASet *set, const char *key, void *data);
void* na_set_get(struct NASet *set, const char *key);
void na_set_del(struct NASet *set, const char *key);

struct NASetIterator;

typedef struct NASetEntry {
const char *key;
void *data;
} NASetEntry;

struct NASetIterator* na_set_iterator_get(struct NASet *set);
void na_set_iterator_destroy(struct NASetIterator* it);
int na_set_iterator_next(struct NASetIterator* it, NASetEntry *entry);
void na_set_iterator_reset(struct NASetIterator* it);

#endif
[/sourcecode]

As you can see, it’s nothing special, just basic set (well, it’s really dictionary but NIH terminology applies to this project) manipulation functions plus an iterator to scan through it — quite useful for e.g. showing all options or invoking all registered parsers. Implementation wise it’s simple hash table with djb2 hash.

NihAV: core

Sunday, June 14th, 2015

Here’s how the main NihAV header looks and it should remain the same (maybe I’ll add error codes there as well but that’s it):

[sourcecode language=”c”]
#ifndef NA_COMMON_H
#define NA_COMMON_H

#include
#include

struct NASet;

enum NAOptionType {
NA_OPT_NULL = 0,
NA_OPT_FLAGS,
NA_OPT_INT,
NA_OPT_DOUBLE,
NA_OPT_STRING,
NA_OPT_BINARY,
NA_OPT_POINTER,
};

typedef union NAOptionValue {
int64_t i64;
uint64_t u64;
double dbl;
const char *str;
struct bin {
const char *ptr;
size_t size;
} bin;
const void *ptr;
} NAOptionValue;

typedef struct NAOption {
const char *name;
enum NAOptionType type;
NAOptionValue value;
} NAOption;

enum NAOptionInterfaceType {
NA_OPT_IF_ANY,
NA_OPT_IF_MINMAX,
NA_OPT_IF_ENUMERATED,
};

typedef struct NAOptionInterface {
const char *name;
const char *explanation;
enum NAOptionType type;
enum NAOptionInterfaceType if_type;
NAOptionValue min_val, max_val;
NAOptionValue *enums;
} NAOptionInterface;

typedef struct NALibrary {
void* (*malloc)(size_t size);
void* (*realloc)(void *ptr, size_t new_size);
void (*free)(void *ptr);

struct NASet *components;
} NALibrary;

#define NA_CLASS_MAGIC 0x11AC1A55

typedef struct NAClass {
uint32_t magic;
const char *name;
const NAOptionInterface *opt_if;
struct NASet *options;
NALibrary *library;

void (*cleanup)(NAClass *c);
} NAClass;

void na_init_library(NALibrary *lib);
void na_init_library_custom_alloc(NALibrary *lib,
void* (*new_malloc)(size_t size),
void* (*new_realloc)(void *ptr, size_t new_size),
void (*new_free)(void *ptr));
int na_lib_add_component(NALibrary *lib, const char *cname, void *component);
void *na_lib_query_component(NALibrary *lib, const char *cname);
void na_clean_library(NALibrary *lib);

int na_class_set_option(NAClass *c, NAOption *opt);
const NAOption* na_class_query_option(NAClass *c, const char *name);
void na_class_unset_option(NAClass *c, const char *name);
void na_class_destroy(NAClass *c);

#endif
[/sourcecode]

So what we have here is essentially three main entities NihAV will use for everything: NALibrary, NAClass and NAOption.

NALibrary is the core that manages the rest. As you can see it has a collection of components that, as discussed in the previous post, will contain the set of instances implementing tasks (e.g. codecs, de/compressors, hashes, de/muxers etc.) and this library object also contains allocator for memory management. This way it can be all pinned to the needed instance, e.g. once I’ve seen a code that had used libavcodec in two separate modules — for video and audio of course — and those two modules didn’t know a thing about each other (and were dynamically loaded too). Note to self: implement filtered loading for components, e.g. when initialising libnacodec only audio decoders will be registered or when initialising libnacompr only decoders are registered etc. etc.

The second component is NAClass. Every public component of NihAV beside NALibrary will be an instance of NAClass. Users are not supposed to construct one themselves, there will be utility functions for doing that behind the scenes (after all, you don’t need this object directly, you need a component in NALibrary doing what you want).

And the third component is what makes it more extensible without adding many public fields — NAOption for storing parameters in a class and NAOptionInterface for defining what options that class accepts.

Expected flow is like this:

  1. NALibrary instance is created;
  2. needed compontents are registered there (by creating copies inside the library tied to it — see the next to last field in NAClass);
  3. when an instance is queried, a copy is created for that operation (the definition is quite small and you should not do it often so it should not be a complete murder);
  4. user sets the options on the obtained instance;
  5. user uses aforementioned instance to do work (coding/decoding, muxing, whatever);
  6. user invokes destructor for the instance;
  7. NALibrary instance is destroyed.

There will be some exceptions, i.e. probing should be done stateless by simply walking over the set of probe objects and invoking probe() there without creating a new instances. And something similar for decoder detection too — current libavcodec way with registering and initialising all decoders is an overkill.

This is how it should be. Volunteers to implement? None? Not even myself?! Yes, I thought so.

NihAV: base

Thursday, June 4th, 2015

As you might have noticed, NihAV development is not going very fast (or at all — thanks to certain people and companies (where I’d never worked and have no desire to work at) that made me lost a desire to program anything) but at least I think somewhat on NihAV design.

So, here’s how the base should look:

NALibrary
   -> <named collection of NihAV components>
     -> NAClass instance that does some task

So, first you create NALibrary that is used to hold everything else. The main content of this library is a set of named collections corresponding to the tasks (e.g. “io” for I/O handlers, “demux” for demuxers, “compr” for compressors etc. etc.). Each collection holds objects based on NAClass that do some specific task — implement file or network I/O, demux AVI or Bink, compress into deflate format etc. All of this is supposed to be hidden from the final user though — it’s NihAV libraries that do all the interaction with NALibrary, they know their collection name and what type of components is stored there. So when you ask for ASF demuxer, the function na_demuxer_find() will access "demux" collection in the provided NALibrary and then it will try to find a demuxer with name "asf" there. NAClass provides common interface for object manipulation — name querying, options setting, etc.

And a word about demuxers — the more I think about it the more I’m convinced that they should output both packets and streams. This is not just for user inconvenience, it also helps chaining demuxers (nothing prevents people from putting raw DV into ASF and then muxing that into MOV with ASF packets containing DV packets — nothing except common sense but it’s too rare to rely upon).

Morale: if you want to implement multimedia framework start with hash table implementation.

P.S. As for implementation language I’ll still stick to C. Newer programming languages like Rust or Swift or that one with retarded gopher have the problem of being not well-widespread, i.e. what if I’m using somewhat outdated Ubuntu or Debian — especially on ARM — where I don’t want to mess with compiler (cross)compilation? Plus it’s likely I’ll make mistakes that will be hard for me to debug and constructions to work around (I doubt modern languages like passing void* on public interface that’s cast to something else inside the function implementation). Of course it’s a matter of experience but I’d rather train on something smaller scale first for a new language.

NihAV: implementation start

Thursday, May 14th, 2015

Before people reading this blog (all 0 of them) start asking about it — yes, I’ve started implementing NihAV, it will take a lot of time so don’t expect it to be finished or even usable this decade at least (too little free time, even less interest and too much work needed to be done to have it at least somewhat usable for anything).

Here’s the intended structure:

  • libnaarch — arch-specific stuff here, like little/big endian handling, specific speedup tricks etc. Do not confuse with libnaosstuff — I’m not interested in non-POSIX systems.
  • libnacodec — codecs will belong here.
  • libnacompr — decompression and compression routines belong here.
  • libnacrypto — cryptographic stuff (hashes, cyphers, ROT-13) belongs here.
  • libnadata — data structures implementations.
  • libnaformat — muxers and demuxers.
  • libnamath — mathematics-related stuff (fixedpoint calculations, fractional math etc).
  • libnaregistry — codecs registry. Codec information is stored here — both overall codec infomation (name to description mapping) and codec name search by tag. That means that e.g. FOURCC to codec name mapping from AVI or MOV are a part of this library instead of remaining demuxer-specific.
  • libnautil — utility stuff not belonging to any other library.

Remark to myself: maybe it’s worth splitting out libnadsp so it won’t clutter libnacodec.

Probably I’ll start with a simple stuff — implement dictionary support (for options), AVI demuxer and some simple decoder.

NihAV: Logo proposal

Monday, May 11th, 2015

Originally it should’ve been Bender Bending Rodriguez on the Moon (implying that he’ll build his own NihAV with …) but since I lack drawing skills (at school I’ve managed to draw something more or less realistic only once), here’s the alternative logo drawn by professional coder in Inkscape in couple of minutes.

nihav

Somehow I believe that building a castle on a swamp is a good metaphor for this enterprise as well (and much easier to draw too).

NihAV — A New Approach to Multimedia Pt. 8

Monday, May 11th, 2015

Demuxers

First of all, as I’ve mentioned in the beginning, codecs should have string IDs. Of course if codec tag is present it can be passed too. Or stream handler in AVI, it’s just optional. This way AVI demuxer can report codec {NIH_TYPE_VIDEO, "go2meeting", "G2M6"} or {NIH_TYPE_VIDEO, "unknown", 'COL0"} and an external program can guess what the codec is that and handle it specially.

Second, demuxers should return two types of data — packets and streams. E.g. MPEG-TS (the best container ever by popular vote) does not care about frame boundaries, so it should not try to be smart and return a stream that can be fed to a parser and that parser will produce proper packets.

Third, parsers. There are two kinds of them — ones that split stream into frames and ones that parse frame data to set some properties to the packet. They should be two separate entities and invoked differently, one after another if needed.

Something similar for muxers — everybody knows that one can mux absolutely any codec into AVI. Or put H.264 into MPEG-PS (hi Luca!). Muxers just should allow callers to do that when possible instead of failing because codec is unrecognised.

P.S. If I’m ever to implement this all it will take a lot of time and Trocadero.