Predicting NGV^W RMHD

July 19th, 2015

Here’s an occasional prediction how RMHD should look like knowing nothing about it beside press release claims.

  • Base standard — for RV 1 and 2 it was H.263, for RV 3 and 4 it was H.264. Obviously, RMHD and RMUHD should be based on H.265;
  • MV precision — RV 2 had ½-pel MV, RV 3 had ⅓-pel MV, RV 4 had ¼-pel MV. Obviously, RMHD will have ⅕-pel MV. Or still ¼-pel because H.265 has not improved MV precision compared to H.264;
  • Bitstream coding — usually that one is kept from previous generation of ripoff codec. Thus, H.265 keeps decoding VLCs further compressed with CABAC, AVS2 (aka HEVS) keeps doing the same with its own coder, VPx using range coder from VP<x-1> and static probabilities Huffman codes. RMHD is supposed to have context-dependent Huffman tables with some bitcoder following it. I.e. determine bitcode from element neighbours and then code each bit of it using some context-adaptive coder (and add some context-dependency somewhere too).
  • Special features — probably none, it will just follow the standard in codec design and the main difference will be in coefficients coding. There’s a chance they’ll build in some scalability feature though.

Let’s live and see what RMHD will really be. It would be nice if none of these predictions us correct.

Springtime for H.265 clones!

July 15th, 2015

Previously I feared there won’t be any H.265 clones beside VP<git-experimental> codec but luckily I was proved wrong.

There’s the second announcement of Really?Networks RMHD, intended for China (RealMedia was popular there after all). Either it’s their completely new codec (NGV) that has finally buffered 100% based on some original ideas or it’s H.265 ripoff. I’d bet on the latter.

Second, I’ve finally read a book describing upcoming AVS2 (again, intended for China and being a Chinese standard). Well, if the first paragraph describing it has such abbreviations as CU, PU and TU you may be sure it’s an original codec that has nothing to do with H.265. Coding concepts like variable block transform, splitting motion compensating block unevenly and having 34 intra prediction modes — those concepts are completely original and are not used anywhere else for sure. Of course there’s some Chinese logic involved in some decisions and thus codec has such gems ripped off HEVC like coding motion vectors in integer precision instead of quarterpel if they exceed certain limit or coding coefficients in zigzags of 4×4 blocks or having special treating for 64×64 blocks (this block is downscaled first and then transformed with conventional 32×32 transform — and they call it Logical Transform BTW) or special motion vector prediction mode for F-frames.

But that’s not all — they’ve introduced special “scene coding”. It relies on G-frames or GB-frames that contain scene background and it may be not displayed (who said VPx?!), and S-frames contain foreground motion. Though I’m pretty sure one can emulate it using H.265 features too, maybe longrefs plus no_display flag. I’m also pretty sure that if HEVC lacks some coding approach for now it will be added soon as a special extension (at least what I’ve read in screen coding extension looked completely logical — like a saddle as one of car seats).


Now I can be sure at last that codec future is looking good.

UPD: And there’s Cisco Thor now as well (simplified HEVC with VLC instead of CABAC). It does two things simultaneously — expands H.265 ripoffs family and borrows more from H.264. Now the only thing missing is Sorenson SVQ5 (or Double Spark or whatever name they want to give it).

On Greece

July 12th, 2015

I see too much bullshit about Greece in Internet these days, so much of it that I could not refrain from writing this post.

First of all, I come from a country with even worse economical situation (fun fact — the former Ukrainian ostrich supportedpresident complained how hard it’s to repay debts on his visit to Greece during the first Greek debt crisis). Unlike Greece most of people got no money from government, companies had large tax burden (in the latter years the government decided to press companies to pay taxes in advance and in amount decided by the tax inspection, tax returns working only for selected companies), lots of debts that went to no good purpose…

But enough about similarities between countries (certain Italians are not happy about similarities between Ukraine and Italy either), let’s get to the bullshit statements.

It’s not their fault. Of course it is, they had to forge their financial statistics under gunpoint in order to join and remain in Eurozone. Of course they share blame with Eurobureaucracy that wanted to extend EU even with a Greece and was willing to overlook their faults in order to keep it. Yet active part had been done by Greek government — it’s easy to buy voters with borrowed money that somebody else has to return in the future (in other words — not our problem). Another point of tension is Schengen area membership: because of good border control they have a lot of illegal immigrants and that’s what EU needs, hopefully when some neighbouring lands will connect Greece to the rest of Schengen area it will bring joy to everyone, especially to the UK.

The whole world is in debt to Greece for their achievements in culture and science. First of all, that sounds like typical copyright. “My grandfather once wrote a song that was played on a radio, I deserve not to work ever in my life.” (some Slashdot comment as I remember it from a decade ago or so). Second, most of the current countries have nothing to do with the nations that were on that territory a thousand or two thousand years ago. Look at Arab Republic Egypt — there was nothing Arabic in the people who built pyramids, temples and sphinxes. If you believe David Ben-Gurion’s thesis, then Palestinians are true Israeli people who lost their culture because of Arab conquests — they seem to oppose their original religion even to this day. Same story with Balkan nations and Ottoman Empire: modern Greece has nothing to do with the ancient Greece except in territory (say hello to Macedonia) and similar language. So, nice knowing you but don’t claim the old history to yourself; and while I’m grateful for those past achievements, they are not yours. I’d been living in a country that tried to exploit that (mostly in form of Soviet legacy and what colloquial “they” did for everyone), no thanks.

NihAV: core

June 14th, 2015

Here’s how the main NihAV header looks and it should remain the same (maybe I’ll add error codes there as well but that’s it):

  1. #ifndef NA_COMMON_H
  2. #define NA_COMMON_H
  3.  
  4. #include <stddef .h>
  5. #include <stdint .h>
  6.  
  7. struct NASet;
  8.  
  9. enum NAOptionType {
  10.     NA_OPT_NULL = 0,
  11.     NA_OPT_FLAGS,
  12.     NA_OPT_INT,
  13.     NA_OPT_DOUBLE,
  14.     NA_OPT_STRING,
  15.     NA_OPT_BINARY,
  16.     NA_OPT_POINTER,
  17. };
  18.  
  19. typedef union NAOptionValue {
  20.     int64_t     i64;
  21.     uint64_t    u64;
  22.     double      dbl;
  23.     const char *str;
  24.     struct bin {
  25.         const char *ptr;
  26.         size_t      size;
  27.     } bin;
  28.     const void *ptr;
  29. } NAOptionValue;
  30.  
  31. typedef struct NAOption {
  32.     const char        *name;
  33.     enum NAOptionType  type;
  34.     NAOptionValue      value;
  35. } NAOption;
  36.  
  37. enum NAOptionInterfaceType {
  38.     NA_OPT_IF_ANY,
  39.     NA_OPT_IF_MINMAX,
  40.     NA_OPT_IF_ENUMERATED,
  41. };
  42.  
  43. typedef struct NAOptionInterface {
  44.     const char        *name;
  45.     const char        *explanation;
  46.     enum NAOptionType  type;
  47.     enum NAOptionInterfaceType if_type;
  48.     NAOptionValue      min_val, max_val;
  49.     NAOptionValue     *enums;
  50. } NAOptionInterface;
  51.  
  52. typedef struct NALibrary {
  53.     void* (*malloc)(size_t size);
  54.     void* (*realloc)(void *ptr, size_t new_size);
  55.     void  (*free)(void *ptr);
  56.  
  57.     struct NASet *components;
  58. } NALibrary;
  59.  
  60. #define NA_CLASS_MAGIC 0x11AC1A55
  61.  
  62. typedef struct NAClass {
  63.     uint32_t                 magic;
  64.     const char              *name;
  65.     const NAOptionInterface *opt_if;
  66.     struct NASet            *options;
  67.     NALibrary               *library;
  68.  
  69.     void                   (*cleanup)(NAClass *c);
  70. } NAClass;
  71.  
  72. void na_init_library(NALibrary *lib);
  73. void na_init_library_custom_alloc(NALibrary *lib,
  74.                                   void* (*new_malloc)(size_t size),
  75.                                   void* (*new_realloc)(void *ptr, size_t new_size),
  76.                                   void  (*new_free)(void *ptr));
  77. int  na_lib_add_component(NALibrary *lib, const char *cname, void *component);
  78. void *na_lib_query_component(NALibrary *lib, const char *cname);
  79. void na_clean_library(NALibrary *lib);
  80.  
  81. int na_class_set_option(NAClass *c, NAOption *opt);
  82. const NAOption* na_class_query_option(NAClass *c, const char *name);
  83. void na_class_unset_option(NAClass *c, const char *name);
  84. void na_class_destroy(NAClass *c);
  85.  
  86. #endif

So what we have here is essentially three main entities NihAV will use for everything: NALibrary, NAClass and NAOption.

NALibrary is the core that manages the rest. As you can see it has a collection of components that, as discussed in the previous post, will contain the set of instances implementing tasks (e.g. codecs, de/compressors, hashes, de/muxers etc.) and this library object also contains allocator for memory management. This way it can be all pinned to the needed instance, e.g. once I’ve seen a code that had used libavcodec in two separate modules — for video and audio of course — and those two modules didn’t know a thing about each other (and were dynamically loaded too). Note to self: implement filtered loading for components, e.g. when initialising libnacodec only audio decoders will be registered or when initialising libnacompr only decoders are registered etc. etc.

The second component is NAClass. Every public component of NihAV beside NALibrary will be an instance of NAClass. Users are not supposed to construct one themselves, there will be utility functions for doing that behind the scenes (after all, you don’t need this object directly, you need a component in NALibrary doing what you want).

And the third component is what makes it more extensible without adding many public fields — NAOption for storing parameters in a class and NAOptionInterface for defining what options that class accepts.

Expected flow is like this:

  1. NALibrary instance is created;
  2. needed compontents are registered there (by creating copies inside the library tied to it — see the next to last field in NAClass);
  3. when an instance is queried, a copy is created for that operation (the definition is quite small and you should not do it often so it should not be a complete murder);
  4. user sets the options on the obtained instance;
  5. user uses aforementioned instance to do work (coding/decoding, muxing, whatever);
  6. user invokes destructor for the instance;
  7. NALibrary instance is destroyed.

There will be some exceptions, i.e. probing should be done stateless by simply walking over the set of probe objects and invoking probe() there without creating a new instances. And something similar for decoder detection too — current libavcodec way with registering and initialising all decoders is an overkill.

This is how it should be. Volunteers to implement? None? Not even myself?! Yes, I thought so.

NihAV: base

June 4th, 2015

As you might have noticed, NihAV development is not going very fast (or at all — thanks to certain people and companies (where I’d never worked and have no desire to work at) that made me lost a desire to program anything) but at least I think somewhat on NihAV design.

So, here’s how the base should look:

NALibrary
   -> <named collection of NihAV components>
     -> NAClass instance that does some task

So, first you create NALibrary that is used to hold everything else. The main content of this library is a set of named collections corresponding to the tasks (e.g. “io” for I/O handlers, “demux” for demuxers, “compr” for compressors etc. etc.). Each collection holds objects based on NAClass that do some specific task — implement file or network I/O, demux AVI or Bink, compress into deflate format etc. All of this is supposed to be hidden from the final user though — it’s NihAV libraries that do all the interaction with NALibrary, they know their collection name and what type of components is stored there. So when you ask for ASF demuxer, the function na_demuxer_find() will access "demux" collection in the provided NALibrary and then it will try to find a demuxer with name "asf" there. NAClass provides common interface for object manipulation — name querying, options setting, etc.

And a word about demuxers — the more I think about it the more I’m convinced that they should output both packets and streams. This is not just for user inconvenience, it also helps chaining demuxers (nothing prevents people from putting raw DV into ASF and then muxing that into MOV with ASF packets containing DV packets — nothing except common sense but it’s too rare to rely upon).

Morale: if you want to implement multimedia framework start with hash table implementation.

P.S. As for implementation language I’ll still stick to C. Newer programming languages like Rust or Swift or that one with retarded gopher have the problem of being not well-widespread, i.e. what if I’m using somewhat outdated Ubuntu or Debian — especially on ARM — where I don’t want to mess with compiler (cross)compilation? Plus it’s likely I’ll make mistakes that will be hard for me to debug and constructions to work around (I doubt modern languages like passing void* on public interface that’s cast to something else inside the function implementation). Of course it’s a matter of experience but I’d rather train on something smaller scale first for a new language.

NihAV: implementation start

May 14th, 2015

Before people reading this blog (all 0 of them) start asking about it — yes, I’ve started implementing NihAV, it will take a lot of time so don’t expect it to be finished or even usable this decade at least (too little free time, even less interest and too much work needed to be done to have it at least somewhat usable for anything).

Here’s the intended structure:

  • libnaarch — arch-specific stuff here, like little/big endian handling, specific speedup tricks etc. Do not confuse with libnaosstuff — I’m not interested in non-POSIX systems.
  • libnacodec — codecs will belong here.
  • libnacompr — decompression and compression routines belong here.
  • libnacrypto — cryptographic stuff (hashes, cyphers, ROT-13) belongs here.
  • libnadata — data structures implementations.
  • libnaformat — muxers and demuxers.
  • libnamath — mathematics-related stuff (fixedpoint calculations, fractional math etc).
  • libnaregistry — codecs registry. Codec information is stored here — both overall codec infomation (name to description mapping) and codec name search by tag. That means that e.g. FOURCC to codec name mapping from AVI or MOV are a part of this library instead of remaining demuxer-specific.
  • libnautil — utility stuff not belonging to any other library.

Remark to myself: maybe it’s worth splitting out libnadsp so it won’t clutter libnacodec.

Probably I’ll start with a simple stuff — implement dictionary support (for options), AVI demuxer and some simple decoder.

LZ77-based compressors — a story similar to lossless codecs

May 12th, 2015

What do LZ77 compressors and lossless codecs have in common? They are both perform lossless compression and there are too many of them because everyone tries to invent their own. And like lossless audio codecs — quite often in their own container too.

In case you don’t know (shame on you!) LZ77 scheme parses input into pieces like <literal> <copy> <literal> ... Literal means “copy these input bytes verbatim”, copy is “we had that substring some time ago, copy N bytes from the history at offset M”.

The idea by itself is rather simple and thus it’s easy to implement some LZ77 parsing with the following coding, slap your name on it and present as some new algorithm. There are three branches of implementation goals there — fast (but somewhat decent) compression, high (but not so fast) compression and experimental research that may lead to implementations in the first two branches.

Fast compression schemes usually pack everything into bytes so no time is wasted on bit reading. Usually format is like this — if top three bits of the next byte are something, then read literal copy length, otherwise determine offset size, read it and copy string from the dictionary. Quite often there are small tweaks to make compression faster (like using hashes) or slightly better (using escape values to code long values and coding small offsets/lengths into opcode etc.). There are so many implementations like that and they still keep appearing. LZO, LZF, FastLZ, snappy, chameleon… And lots of old games used such compression for their resources (including video) too.

High compression schemes use much better compressing of the data produced by LZ77 parsing and spending more cycles on finding the best parsing of the input. It all started essentially with LZHUF when someone decided to employ Huffman codes instead of writing values in a fixed amount of bits. If you’ve never heard about LHA/LZH you need your Amiga box confiscated. This approach reached its peak with Deflate — by modern standards it’s not the best format to compress (i.e. not fast enough, does not compress high enough etc etc.) but it’s the standard available everywhere and in any form. Deflate uses custom per-block Huffman codes with their definition stored in compressed form as well so there’s hardly anything to improve there radically. And thus (patent expiration helped greatly too) another form of LZ77-based compression started to bloom — LZA (using modelling and arithmetic coding on LZ77 parsing results). Current favourite LZMA (and main RAR compression scheme) uses this approach too albeit in very sophisticated form — preprocessors to increase compression ratio on some kinds of known data, Markov models, you name it.

And here’s my rant — leave Deflate alone! It’s like JPEG of data compression — old and seemingly not very effective but it’s ubiquitous, well-supported and still has some improvement potential (like demonstrated by e.g. 7-zip and zopfli). I hate it to have as many compression schemes to support as video codecs. Deflate and LZMA are enough for now and I doubt there will be something significantly more effective appearing soon. Work on something lossy — like H.265 encoder optimisations — instead.

NihAV: Logo proposal

May 11th, 2015

Originally it should’ve been Bender Bending Rodriguez on the Moon (implying that he’ll build his own NihAV with …) but since I lack drawing skills (at school I’ve managed to draw something more or less realistic only once), here’s the alternative logo drawn by professional coder in Inkscape in couple of minutes.

nihav

Somehow I believe that building a castle on a swamp is a good metaphor for this enterprise as well (and much easier to draw too).

NihAV — A New Approach to Multimedia Pt. 8

May 11th, 2015

Demuxers

First of all, as I’ve mentioned in the beginning, codecs should have string IDs. Of course if codec tag is present it can be passed too. Or stream handler in AVI, it’s just optional. This way AVI demuxer can report codec {NIH_TYPE_VIDEO, "go2meeting", "G2M6"} or {NIH_TYPE_VIDEO, "unknown", 'COL0"} and an external program can guess what the codec is that and handle it specially.

Second, demuxers should return two types of data — packets and streams. E.g. MPEG-TS (the best container ever by popular vote) does not care about frame boundaries, so it should not try to be smart and return a stream that can be fed to a parser and that parser will produce proper packets.

Third, parsers. There are two kinds of them — ones that split stream into frames and ones that parse frame data to set some properties to the packet. They should be two separate entities and invoked differently, one after another if needed.

Something similar for muxers — everybody knows that one can mux absolutely any codec into AVI. Or put H.264 into MPEG-PS (hi Luca!). Muxers just should allow callers to do that when possible instead of failing because codec is unrecognised.

P.S. If I’m ever to implement this all it will take a lot of time and Trocadero.

NihAV — A New Approach to Multimedia Pt. 7

May 9th, 2015

Modularity — codec level

FFmpeg, obviously, was made to transcode MPEG video (initial commit had support for JPEG, MPEG-1/2 video, some H-263 based formats like M$MPEG-4, MPEG-4 and RV10, MPEG audio layers I-III and AC3). It was expanded to handle other formats but the misdirection in initial design has grown into MpegEncContext that makes the ugliest part of libavcodec to date.

It is easy to start with an abstraction that all codecs consist of I/P/B-frames split into 16×16 macroblocks that have 8×8 DCT blocks. You just need to have some codec-specific decoding (or coding) for picture header or block codes, that’s all. And since they all are very similar why not unite them into single decoding function. I encourage everybody to look at mpv_decode_mb_internal in libavcodec/mpegvideo.c to see how this can go wrong.

Let’s just look at simple model of the codecs that should fit the model I can still name two from the top of my head that don’t fit that well. H.263+ (or was it H.263++?) — it has packed PB-frames that have blocks for both P- and B-frame. IIRC it sends an empty frame just after that so reordering can take place. VC-1 has BI-frames that should be coded as I-frames but treated as B-frames; also it has block subdivision into 8×4, 4×8 or 4×4 subblocks. And there’s On2 VP3. This gets even better with the new generation of codecs — more reference frames and more complex relations between them — B-pyramid in H.264 and H.265 frame management. And there’s On2 VPx. Indeo 4/5 had complex frame management too — droppable references, B-frames, null frames etc.

So, let’s look at video codec decoding stages to see why it’s unwise to try to use the single context to bind them all.

  1. Sequence header — whatever defines codec parameters like frame dimensions, various features used in the bitstream etc. May be as simple as frame dimensions provided by the container; it may be codec extradata from the container as well; it may be as complex as H.265 having multiple SPSes referencing multiple PPSes referencing multiple VPSes.
  2. Picture header — whatever defines frame parameters. Usually it’s frame type, sometimes frame dimensions, sometimes quantiser, whatever vendor decides to put into it.
  3. Slice header — if codec has slices; if codec has separate plane coding or scalable coding it can be considered slices too. Or fields (though they can have slices too). Usually it has information related to slice coding parameters — quantiser, bitstream features enabled etc.
  4. Macroblock header — macroblock type, coded block pattern other information is stored here.
  5. Spatial prediction information — not present for old codecs but is an essential part of intra blocks compression in the newer codecs.
  6. Motion vectors — usually a part of macroblock header but separated here to denote they can be coded in different ways too (e.g. newer codecs have to include reference frame index, for older codecs it’s obvious from the frame type).
  7. Block coefficients.
  8. Trailer information — whatever vendor decides to put at the end of the frame like CRC (or codec version for Indeo 4 I-frames).

And yet there are many features that complicate implementing this scheme in the same framework — frame management (altref frames in VPx, two frames fused together as in Indeo 4 or H.263), sprites, scalable coding features, interlacing, varying block sizes (especially in H.265 and ripoffs). Do you still think it’s a good idea to fit it all into the same mpegvideo?

That is why I believe the best approach in this case is to have small reusable blocks that can be combined to make a decoder. For starters, decoder should have more freedom to where it can decode to — that should be handy in decoding those fused frames, also quite often one decoder is used inside another to decode a part of the frame, especially JPEG and WMV9/VC-1. Second, decoder should be able to pick whatever components it needs — e.g. RealVideo 3/4 used H.264 spatial prediction and chroma motion compensation but the standard I/P/B frame management and its own bitstream decoding. WMV2 was mostly M$MPEG-4 with new motion compensation and special I-frame decoder. AVS (Chinese one) has 8×8 integer DCT coding but also spatial coding from H.264 and its frame management is almost standard I/P/B but P frame references two previous pictures and they’ve added S-frame that is B-frame with only forward references.

Hence I proposed long time ago to split out at least frame management in order to reduce decoder dependencies from mpv (It sank into the swamp. but again, no-one cared). Then block management functions (the utility functions that update and provide pointers to the current block on output frame planes). That sank into the swamp. I’d propose anything else in that direction but it will burn down, fell over, then sink into the swap no-one cares about my proposals.

Still, here’s how I see it.

#include “block_stuff.h”
#include “frame_mgmt.h”
#include “h264/intra_pred.h”

Since this is not intended for the user it can have multiple smaller headers with only related stuff. Also large codec data should’ve been moved into separate subdirectories since ages. It’s more than a thousand files in libavcodec already.

decode_frame()
{
   frame_type = get_bits(gb, 2);
   cur_frm = ipb_frame_get_cur(ctx->ipb, frame_type);
   init_block_pos(ctx->blk, cur_frm);
   for (blocks) {
     update_block_pos(ctx->blk);
     decode_mb(ctx, gb, ctx->blk, mb);
     if (mb->type == INTRA)
       h264_pred_spatial(ctx->blk, mb);
     else
       idct_put_mb420(ctx->blk, mb);
  }
  ipb_frame_update_refs(ctx->ipb, frame_type);
}

We have a lot of smaller blocks here encapsulating needed information — frame management, macroblock position and decoded macroblock information. Many chunks of code are the same between codecs, you often don’t need a full context for a small function that can be reused everywhere. Like spatial prediction — you just need to know if you can have neighbouring pixels, what prediction method to apply and what coefficients to add afterwards — be it RealVideo 3, H.264, or VP5. Similarly after motion vectors are reconstructed you do the same thing in most codecs — copy a rectangular area to the current frame using motion compensation functions. So write it once and reuse everywhere — and you need just a couple of small structures with essential information (where to copy to and what functions to use), not MpegEncContext.

Sigh, I really doubt I’ll see it implemented ever.