NihAV: Aten’t Dead

March 3rd, 2018

Surprisingly, there’s still some life in NihAV and some progress time from time.

So I’ve debugged RealVideo 2 decoding and verified B-part of PB-frame reconstruction in Intel.263 decoder against the binary specification. Mind you, the latter is not likely to be seen supported by libavcodec ever. First, it’s a fringe feature for extremely old video codecs nobody cares about any more and, second, unlike later codecs, B-part is stored along with P-frame data (i.e. first you have macroblock header for P- or I-macroblock, then macroblock header for B-macroblock, then macroblock coefficients for P-part and then macroblock coefficients for B-part). Other codecs simply pack B-frame along with reference frame but here data is interleaved. I’ve added some support for skipping B-part in libavcodec H.263 decoder (exactly nine years ago!) but decoding two frames in parallel would require some serious hacking of infamous MpegEncContext-using core so it’s very unlikely to happen.

And directions for near future still include RealVideo 3/4 and all RealAudio codecs. Fun fact: two of those are patent-free now—ATSC A/52 aka DNET and AAC-LC (but probably not SBR extension used in racp version). So if you implement them now you can flip a middle finger to both D*lby and Ferkel-herzen-Gesellschaft since new decoders can’t be covered by patent licenses. Not that I cared about it before.

Chiariglione Is Right

February 7th, 2018

I guess everybody else has reacted on his post about MPEG crisis so I can do that as well. So, $postname—most people just don’t understand his outlook. If you interpret his words from his point of view it’s clear he’s right for most of the things.
Read the rest of this entry »

ClearVideo: Somewhat Working!

February 3rd, 2018

So I’ve finally written a decoder for ClearVideo in NihAV and it works semi-decently.

Here’s a twentieth frame of basketball.avi from the usual sample repository. Only the first frame was intra-frame, the rest are coded with just the transforms (aka “copy block from elsewhere and change its brightness level if needed too”).

As you can see there are still serious glitches in decoding, especially on bottom and right edges but it’s moving scene and most of it is still good. And the standard “talking head” sample from the same place decodes perfectly. And RealMedia sample is decoded acceptably too.

Many samples are decoded quite fine and it’s amazing how such simple method (it does not code residue unlike other video codecs with interframes!) still achieves good results at reasonable (for that time) bitrate.

Hopefully there are not so many bugs in my implementation to fix so I can finally move to RealVideo 3 and 4. And then probably to audio codecs before RealVideo 6 (aka RealMedia HD) because it needs REing work for details (and maybe wider acceptance). So much stuff to procrastinate!

Update: I did MV clipping wrong, now it works just fine except for some rare glitches in one RealMedia file.

ClearVideo: Some Progress!

January 21st, 2018

I don’t know whether it’s Sweden in general or just proper Swedish Trocadero but I’ve managed to clarify some things in ClearVideo codec.

One of the main problems is that binary specifications are full of cruft: thunks for (almost) every function in newer versions (it’s annoying) and generic containers with all stuff included (so you have lists with elements that have actual payload which are different kinds of classes—it was so annoying that I’ve managed to figure it all out just this week). Anyway, complaining about obscure and annoying binary specifications is fun but it does not give any gain, so let’s move to the actual new and clarified old information. Plus it has several different ways of coding information depending on various flags in extradata.

The codec has two modes: intra frames coded a la JPEG and inter frames that are coded with fractal transforms (and nothing else). Fractal frame is split into tiles of predefined size (that information is stored in extradata) and those tiles may be split into smaller blocks recursively. The information for one block is plane number, flags (most likely to show whether it should be split further), bias value (that should be added to the transformed block) and motion vector (a byte per component). The information is coded with static codebooks and it depends on the coding version and context (it’s one set for version 1, another for version 2 and completely different single codebook for version 6). Codebooks are stored in the resources of decoder wrapper, the same as with DCT coefficients tables.

Now, the extradata. After the copywrong string it actually has the information used in the decoding: picture size (again), flags, version, tile sizes and such. Fun thing is that this information is stored in 32-bit little-endian words for AVI but it uses big-endian words for RealMedia and probably MOV.

And the tables. There are two tables: CVLHUFF (single codebook definition) and HUFF (many codebooks). Both have similar format: first you have byte array for code lengths, then you have 16-bit array of actual codewords (or you can reconstruct them from code lengths the usual way—the shortest code is all zeroes and after that they increase) and finally you have 16-bit array of symbols (just bytes for case of 0x53 chunks in HUFF). The multiple codebook definition has 8-byte header and then codebook chunks in form [id byte][32-bit length in symbols][actual data], there are only 4 possible ID bytes (0xFF—empty table, 0x53—single byte for symbol, the rest is as described above). Those IDs correspond to the tables used to code 16-bit bias value, motion values (as a pair of bytes with possible escape value) and 8-bit flags value.

So, overall structure is more or less clear, underlying details can be verified with some debugging, and I hope to make ClearVideo decoder for NihAV this year. RMHD is still waiting 😉

Popular Swedish Bus Routes

January 9th, 2018

Sweden has a lot of local bus routes and every region (or län) has its own most popular bus route:

  • for Stockholm and Örebro län it’s “Ej i trafik” (something like “not participating in public transit service”, “trafik” in Swedish often means both [car] traffic and public transport service);
  • for Södermanland it’s “Är ej i trafik” (“Is not in service”);
  • in Östergötland it’s “Tyvärr, ej i tjänst” (“Sorry, not in service”).

The joke is that while there are many numbered bus routes (hundreds in Stockholm län), the regulations make bus drivers rest after completing a route so quite often a bus arrives to the end station, unloads all passengers, changes its route number to the one above and goes away; then, obviously, another bus (or the same one after the driver has rested) comes to pick passengers. Since I almost never travel by bus in Germany (we have trams here after all), most of my bus trips happened in Ukraine and Sweden—and those countries differ in approach to drivers indeed.

Another interesting thing is the variety of buses: in Stockholm län you have buses going on trunk lines—quite often those are articulated buses and they’re always painted blue—and ordinary buses (always red); some buses are double-deckers, like on bus route 676 (Stockholm-Norrtälje) and some coaches are double-deckers too (I still fondly remember travelling on top floor of one from Luleå to Sundsvall—no fond memories about Ukrainian bus trips though). And in Norrland they still have skvaders (aka buses with additional cargo departments). Also buses in Stockholm län quite often have USB chargers for every seat and even WiFi—everything for passenger comfort.

It’s quite interesting that some bus routes are operated by two buses: for example, if I want to get from Bromma to Portugal (a place on Adelsö island near Stockholm) I’d take bus 312 which goes to Sjöangen, there I’d step out, get into new bus 312 waiting there while the previous bus goes to the rest. Also it’d travel on a ferry which I also like for some reason.

So there’s something interesting about Swedish buses after all. But railways are still much better (more comfort, higher speeds, less problems from car traffic etc etc) and definitely more awesome (I’ve witnessed rail bus pushing a fallen fir from the tracks less than a week ago—try finding an ordinary bus doing that). But it’s still nice to know that Sweden has good things beside people, trains, food, drinks and nature.

P.S. This seems to have gone a bit further than just describing how popular bus routes differ in various Swedish regions. Hopefully my upcoming NowABitClearerVideo post would go the same way.

Dingo Pictures Works: Early Years

December 8th, 2017

Well, I intended to end my review but I was reminded that there are even more Dingo Pictures works that I’ve missed. So let’s look at those.
Read the rest of this entry »

Rust: Annoyance-Driven Design

December 3rd, 2017

I’ve finally made NihAV decode RealVideo 2 content, including B-frames (there are still 4 video codecs to support (and I don’t have any samples for RMHD) and all audio codecs too so it’s a long way) and so I have some more words to say about Rust and my experience with it.

To me it looks like the most decisions on decompositions in Rust are the consequences of annoyance of making it other way? Too large structures mean you have to either pass too many arguments into new() or fill it with some defaults (and I’m pretty sure that #derive[Default] won’t save you with more complex types) and initialise to sane values later. In result it’s easier to split everything into smaller structures which are (at least) subjectively are much easier to handle, especially if you reference them as Option<YourStruct>. Modules and imports, on the other hoof, are more annoying to manage since you have to take care of proper dependencies, visibility and imports—in result I find it easier to import all stuff from all modules and just keep comment out currently unused imports (because I still can’t bring myself to make it all a single mega-module). And now for the even higher level: crates. Yes, I’m going to beat that undead horse again.

First of all, I’m aware of incremental building enabled in nocturnal Rust but I’m not going to use nightly for rather obvious reasons (mostly because I’m not here to experiment with the all potential bells and whistles of the language but rather what it can offer right out of the box and how it suits my needs). So, the compilation times are horrible: when I change a single non-public function it rebuilds the whole crate (which is supposed behaviour, I know) and it takes 15 seconds to do that. Obviously it’s laughable for people doing “serious” projects but it’s basic fact that humans expect response (any response) in about five seconds after the action or they get impatient. In result instead of one crate with optional features (in my case decoders and demuxers) I’d rather have several smaller crates and that creates new issues too. There’s this obvious npm.js kind of issue of making packages for every small thing so your programs ends with more package dependencies than modern Linux distribution. But there’s also the issue with package splitting: I’d like to split my code into packages that encompass certain family of features—e.g. nihav-core for common stuff, nihav-avi for AVI demuxer, nihav-indeo for all Indeo codecs (audio and video) and nihav-realmedia for RealMedia demuxer and related codecs—then some of them may depend on some common package (like H.263 common core for Intel I.263 and RealVideo 1 and 2 decoders) but probably with different features requested (one of them does not need B-frame support, another one does not need PB-frame support). Since I don’t know quantum cargodynamics I don’t know how it will all be resolved. So it will either end in dead code or code duplication (in an additional crate too, I suppose).

My theory is that people behind Rust are biased by their development environment. In other words you don’t care much about compilation times when you have to build browsers (or compilers) on daily basis. While my main development machine is a laptop I bought in 2010 with 8GB of RAM (which I believed to be future-proof). So the Rust language designers might either have beefy machines to deal with fast compilation or be conditioned to long development cycles. I know that back in the day “start compiling Linux kernel and go make some coffee to pass 45 minutes of compilation time” was quite common but I guess it’s Jevons’ paradox all over again: the more computing power is there the more it’s wasted on compilation times. Like modern C++ or single-header libraries: you actually have to compile a very large corpus of code as single file. Back in the days my laptop with 64MB RAM was spending most of the time compiling libavcodec/dsputil.c (a monstrous file full of templates that old FFmpeg developers might remember even today) so I had to install more RAM in order to make compilation time reasonable. The solution was to split the file instead of upgrading the machines for every developer but nowadays it’d be seen as a ridiculous solution.

And now documentation. I find it rather poor (but that’s common with programming languages). If I know more or less what feature I want I can find it in the standard documentation (if I don’t I would complain about non-overlapping multiple &mut [range] borrows not working instead of using slice.split_at_mut()—and I did) but it does not really tell me what I should be looking for in the first place. I call it Excel complexity. In Excel there’s probably a function that does anything you want but it’s much easier to reimplement it yourself than to look up in the documentation how it’s called and what are its less obvious parameters. And even if you combine both The Rust Programming Language Second Edition and Rust By Example you still won’t get it right. Now that Rust aspires to be a JavaScript replacement it should take an example from it too: provide extensive overview how to do things in it instead of showcasing features. IMO in TRPLv2 there are two chapters—11 and 12—that are close to that ideal: they talk about testing and how to make a console program. In other words, good practical tasks that one would like to achieve with Rust (in other words, not so many people care about features per se, they want something done with a language: build multi-threaded application, parse Web server reply, make an efficient number cruncher etc etc). I can rant more about how it should be organised but nobody reads documentation including me.

There’s still this annoyance with tuples as such too: why I can’t declare let foo, bar; if baz { foo = 4; bar = 2; } else { foo = bar = 0; } and have to use two separate lets? why I can’t have let (foo, bar); if baz { (foo, bar) = (4, 2); } else { (foo, bar) = (0,0); } either? In result while named tuples are there I end up using only unnamed tuples.

So while Rust offers some nice things it has not a very nice way to shape development. And this also explains why C was so popular and still is: it does not enforce any particular behaviour on you (except in recent editions when the standard and compilers suddenly started to care about arithmetic and bit operations being non-portable—you might make your own CPU that does not use two’s complement arithmetic after all), no enforced coding style, you can compile code in any order you like and interface almost anything without special tools or wrappers. And the freedom it offered along with effectiveness is what is often lacking in more modern languages (the saddest thing is that it’s traded not for memory security but rather for sacks of syntactic sugar).

Anyway, I’ll keep experimenting and we’ll see how things will turn out. In either case I should start thinking about splitting NihAV into several crates, registering codecs and such. Too much work, too many opportunities to procrastinate!

Some Notes on VivoActive Video

November 21st, 2017

When you refactor code (even if your own one) any other activity looks better. So I decided to look at VivoActive Video instead of refactoring H.263-based decoders in NihAV.

In case you don’t know, Vivo was a company that created own formats (container and video, no idea about audio) that seems to that old that its beard rivals the beard of its users. Also it’s some MPlayer-related joke but I never got it.

Anyway, it’s two H.263-based video codecs, one being vanilla H.263+ decoder will all exciting stuff like PB-frames (but no B-frames) and another one is an upgrade over it that’s still H.263+ but with different coding scheme.

Actually, how the codec handles coding is the only interesting thing there. First, codebooks. They are stored in semi-readable way: first entry may be an optional FLC marker, last entry is always End marker, the rest of entries are human-readable codes (e.g. 00 1101 11 — the codebook parser actually parses those ones and zeroes and skips white spaces) with some binary data (the number of trailing bits, symbol start value, something else too). The way how bitstream is handled reminds me of VPx somewhat: you have a set of 49 codebooks, you start decoding tokens from certain codebook and then if needed you switch to secondary codebook. In result you get a stream of tokens that may need to be parsed further (skip syncword prevention codes that decode to 0xB3, validate the decoded block—mind you, escape values are handled as normal codes there too, assign codes to proper fields etc etc). In result while it’s easy to figure out which part is H.263 picture/GOB/MB header decoding because of the familiar structure and get_bits() calls, Vivo v2 decoding looks like “decode a stream of tokens, save first ones to certain fields in context, interpret the rest of them depending on them”. For example, macroblock decoding starts with tokens for MB type, CBP and quantiser, those may be followed up by 1 or 4 motion vector deltas and then you have block coefficients (and don’t forget to skip stuffing codes when you get them).

Overall, not a very interesting codec with some crazy internal design (another fun fact: it has another set of codebooks in slightly different format but they seem to be completely unused). I’m not sure if it’s worth implementing but it was interesting to look at.


November 9th, 2017

Dedicated to all young werehedgehogs. — one URL worth thousand words

So, let’s talk about colour in multimedia. To summarise it so you can skip the rest: proper colour representation hardly matters at all.

What is colour from physical point of view? It’s a property of light in visible range (i.e. between infrared and ultraviolet though some people are born without proper UV filters). Even better, you can clearly define it via spectroscopy because it’s a mix of certain wavelengths with certain energies. Another approach is to have reference colours printed on some surface (aka Pantone sets)—and that is the very thing you use to make sure you get what you want when taking a photo (especially on other celestial body) or ensuring consistency of production at typography.

The problem is that either approach is too bulky for use outside certain specific areas, for example it’s too expensive to store the whole spectrum for each pixel even in palette form (also image or video compression would be extremely inconvenient). Good thing is that our eye has its own variant of psychoacoustic masking and you can use several basic colours to achieve the mix. And from this most colour models (or spaces) were born where the range of real (aka present in spectrum) and perceivable colours (like purple or white, which are a mix of several colours) are represented as a composition of some primary colours like red+green+blue or cyan-magenta-yellow. And of course there is famous CIE 1931 model with basis being theoretical components corresponding to sensitivity of cone cells in human eye.

And there came the other problem: most colourspaces (XYZ, HSV and such) are as good as π-based computing system—it’s incredibly convenient for certain kinds of calculations but it’s next to impossible to convert results from and into decimal with good precision. Even RGB with its primary colours widely available has a problem: for instance, the colour of sky outside Britain (in case you didn’t know the etymology of word ‘sky’, it comes from Scandinavian word for cloud) can be represented only with IIRC red component being negative.

So how to deal with it? By mostly not caring as humans usually do. In places where higher colour reproduction fidelity is required (mostly typography) they simply use more primary colours. But overall humans don’t care much if the colours are slightly wrong. On one side, human brain has an internal auto-correction scheme for colour tint and white auto-balance (you might remember that optical illusion with seemingly red strawberries covered by green or blue tint with no pixels being actually red); on the other side each pair of human eyes is unique and perceives colours and shades differently. So if most people won’t agree about actual shade and would recognize picture anyway why bother at all (again, some specific areas excluded)?

So all those TV-related standards that define fine details of colour models are good only for mastering stuff (i.e. to keep consistency for the final product because you might not care about colour being slightly wrong but you’ll spot slight shade mismatch for sure). And speaking about TV-related standards, so-called TV-range (i.e. having component values fit into 16-240 range instead of 0-255 as you’d expect) is an archaism that should’ve been buried long time ago along with analogue TV broadcasting. But it still exists in digital world standards along with interlacing and KROPPING! not fully purged yet.

And speaking about shade differences, some of you might remember the era of VGA where each component actually had only 64 possible values and yet it was enough to create very convincing moving pictures. You may argue that the underlying issue was masked by palette mode I should point out that for rather long time after that people had to live with laptops and displays that had cheap LCDs with actual 18-bit colour depth (i.e. the same 6 bits per component as on VGA) as well (and let’s not talk about black colour representation there). So people didn’t care much about that and all this high-bitdepth stuff seems to be more of marketing creation than actual technical necessity (again, I understand that it’s needed somewhere like medical imaging, but common people don’t care about quality).

In the conclusion I want to say that the main reasons for introducing higher bitdepth wherever possible are: because we can (I understand and respect that), because it keeps many engineers and marketers employed (I understand that but don’t agree much) and because it helps fixing some other problems introduced elsewhere (like TV-range helped to deal with filtering artefacts—that I understand as well and try to respect but fail). Now be a good hedgehog and set proper colour profile in IMF metadata.

Dingo Pictures Works: Classics pt. 2 and Final Thoughts

November 5th, 2017

Sadly, all good things come to an end and this series review is no different. Let’s look at the last three entries before I give my opinion on all of them.
Read the rest of this entry »