ClearVideo: Some Progress!

January 21st, 2018

I don’t know whether it’s Sweden in general or just proper Swedish Trocadero but I’ve managed to clarify some things in ClearVideo codec.

One of the main problems is that binary specifications are full of cruft: thunks for (almost) every function in newer versions (it’s annoying) and generic containers with all stuff included (so you have lists with elements that have actual payload which are different kinds of classes—it was so annoying that I’ve managed to figure it all out just this week). Anyway, complaining about obscure and annoying binary specifications is fun but it does not give any gain, so let’s move to the actual new and clarified old information. Plus it has several different ways of coding information depending on various flags in extradata.

The codec has two modes: intra frames coded a la JPEG and inter frames that are coded with fractal transforms (and nothing else). Fractal frame is split into tiles of predefined size (that information is stored in extradata) and those tiles may be split into smaller blocks recursively. The information for one block is plane number, flags (most likely to show whether it should be split further), bias value (that should be added to the transformed block) and motion vector (a byte per component). The information is coded with static codebooks and it depends on the coding version and context (it’s one set for version 1, another for version 2 and completely different single codebook for version 6). Codebooks are stored in the resources of decoder wrapper, the same as with DCT coefficients tables.

Now, the extradata. After the copywrong string it actually has the information used in the decoding: picture size (again), flags, version, tile sizes and such. Fun thing is that this information is stored in 32-bit little-endian words for AVI but it uses big-endian words for RealMedia and probably MOV.

And the tables. There are two tables: CVLHUFF (single codebook definition) and HUFF (many codebooks). Both have similar format: first you have byte array for code lengths, then you have 16-bit array of actual codewords (or you can reconstruct them from code lengths the usual way—the shortest code is all zeroes and after that they increase) and finally you have 16-bit array of symbols (just bytes for case of 0x53 chunks in HUFF). The multiple codebook definition has 8-byte header and then codebook chunks in form [id byte][32-bit length in symbols][actual data], there are only 4 possible ID bytes (0xFF—empty table, 0x53—single byte for symbol, the rest is as described above). Those IDs correspond to the tables used to code 16-bit bias value, motion values (as a pair of bytes with possible escape value) and 8-bit flags value.

So, overall structure is more or less clear, underlying details can be verified with some debugging, and I hope to make ClearVideo decoder for NihAV this year. RMHD is still waiting 😉

Popular Swedish Bus Routes

January 9th, 2018

Sweden has a lot of local bus routes and every region (or län) has its own most popular bus route:

  • for Stockholm and Örebro län it’s “Ej i trafik” (something like “not participating in public transit service”, “trafik” in Swedish often means both [car] traffic and public transport service);
  • for Södermanland it’s “Är ej i trafik” (“Is not in service”);
  • in Östergötland it’s “Tyvärr, ej i tjänst” (“Sorry, not in service”).

The joke is that while there are many numbered bus routes (hundreds in Stockholm län), the regulations make bus drivers rest after completing a route so quite often a bus arrives to the end station, unloads all passengers, changes its route number to the one above and goes away; then, obviously, another bus (or the same one after the driver has rested) comes to pick passengers. Since I almost never travel by bus in Germany (we have trams here after all), most of my bus trips happened in Ukraine and Sweden—and those countries differ in approach to drivers indeed.

Another interesting thing is the variety of buses: in Stockholm län you have buses going on trunk lines—quite often those are articulated buses and they’re always painted blue—and ordinary buses (always red); some buses are double-deckers, like on bus route 676 (Stockholm-Norrtälje) and some coaches are double-deckers too (I still fondly remember travelling on top floor of one from Luleå to Sundsvall—no fond memories about Ukrainian bus trips though). And in Norrland they still have skvaders (aka buses with additional cargo departments). Also buses in Stockholm län quite often have USB chargers for every seat and even WiFi—everything for passenger comfort.

It’s quite interesting that some bus routes are operated by two buses: for example, if I want to get from Bromma to Portugal (a place on Adelsö island near Stockholm) I’d take bus 312 which goes to Sjöangen, there I’d step out, get into new bus 312 waiting there while the previous bus goes to the rest. Also it’d travel on a ferry which I also like for some reason.

So there’s something interesting about Swedish buses after all. But railways are still much better (more comfort, higher speeds, less problems from car traffic etc etc) and definitely more awesome (I’ve witnessed rail bus pushing a fallen fir from the tracks less than a week ago—try finding an ordinary bus doing that). But it’s still nice to know that Sweden has good things beside people, trains, food, drinks and nature.

P.S. This seems to have gone a bit further than just describing how popular bus routes differ in various Swedish regions. Hopefully my upcoming NowABitClearerVideo post would go the same way.

Dingo Pictures Works: Early Years

December 8th, 2017

Well, I intended to end my review but I was reminded that there are even more Dingo Pictures works that I’ve missed. So let’s look at those.
Read the rest of this entry »

Rust: Annoyance-Driven Design

December 3rd, 2017

I’ve finally made NihAV decode RealVideo 2 content, including B-frames (there are still 4 video codecs to support (and I don’t have any samples for RMHD) and all audio codecs too so it’s a long way) and so I have some more words to say about Rust and my experience with it.

To me it looks like the most decisions on decompositions in Rust are the consequences of annoyance of making it other way? Too large structures mean you have to either pass too many arguments into new() or fill it with some defaults (and I’m pretty sure that #derive[Default] won’t save you with more complex types) and initialise to sane values later. In result it’s easier to split everything into smaller structures which are (at least) subjectively are much easier to handle, especially if you reference them as Option<YourStruct>. Modules and imports, on the other hoof, are more annoying to manage since you have to take care of proper dependencies, visibility and imports—in result I find it easier to import all stuff from all modules and just keep comment out currently unused imports (because I still can’t bring myself to make it all a single mega-module). And now for the even higher level: crates. Yes, I’m going to beat that undead horse again.

First of all, I’m aware of incremental building enabled in nocturnal Rust but I’m not going to use nightly for rather obvious reasons (mostly because I’m not here to experiment with the all potential bells and whistles of the language but rather what it can offer right out of the box and how it suits my needs). So, the compilation times are horrible: when I change a single non-public function it rebuilds the whole crate (which is supposed behaviour, I know) and it takes 15 seconds to do that. Obviously it’s laughable for people doing “serious” projects but it’s basic fact that humans expect response (any response) in about five seconds after the action or they get impatient. In result instead of one crate with optional features (in my case decoders and demuxers) I’d rather have several smaller crates and that creates new issues too. There’s this obvious npm.js kind of issue of making packages for every small thing so your programs ends with more package dependencies than modern Linux distribution. But there’s also the issue with package splitting: I’d like to split my code into packages that encompass certain family of features—e.g. nihav-core for common stuff, nihav-avi for AVI demuxer, nihav-indeo for all Indeo codecs (audio and video) and nihav-realmedia for RealMedia demuxer and related codecs—then some of them may depend on some common package (like H.263 common core for Intel I.263 and RealVideo 1 and 2 decoders) but probably with different features requested (one of them does not need B-frame support, another one does not need PB-frame support). Since I don’t know quantum cargodynamics I don’t know how it will all be resolved. So it will either end in dead code or code duplication (in an additional crate too, I suppose).

My theory is that people behind Rust are biased by their development environment. In other words you don’t care much about compilation times when you have to build browsers (or compilers) on daily basis. While my main development machine is a laptop I bought in 2010 with 8GB of RAM (which I believed to be future-proof). So the Rust language designers might either have beefy machines to deal with fast compilation or be conditioned to long development cycles. I know that back in the day “start compiling Linux kernel and go make some coffee to pass 45 minutes of compilation time” was quite common but I guess it’s Jevons’ paradox all over again: the more computing power is there the more it’s wasted on compilation times. Like modern C++ or single-header libraries: you actually have to compile a very large corpus of code as single file. Back in the days my laptop with 64MB RAM was spending most of the time compiling libavcodec/dsputil.c (a monstrous file full of templates that old FFmpeg developers might remember even today) so I had to install more RAM in order to make compilation time reasonable. The solution was to split the file instead of upgrading the machines for every developer but nowadays it’d be seen as a ridiculous solution.

And now documentation. I find it rather poor (but that’s common with programming languages). If I know more or less what feature I want I can find it in the standard documentation (if I don’t I would complain about non-overlapping multiple &mut [range] borrows not working instead of using slice.split_at_mut()—and I did) but it does not really tell me what I should be looking for in the first place. I call it Excel complexity. In Excel there’s probably a function that does anything you want but it’s much easier to reimplement it yourself than to look up in the documentation how it’s called and what are its less obvious parameters. And even if you combine both The Rust Programming Language Second Edition and Rust By Example you still won’t get it right. Now that Rust aspires to be a JavaScript replacement it should take an example from it too: provide extensive overview how to do things in it instead of showcasing features. IMO in TRPLv2 there are two chapters—11 and 12—that are close to that ideal: they talk about testing and how to make a console program. In other words, good practical tasks that one would like to achieve with Rust (in other words, not so many people care about features per se, they want something done with a language: build multi-threaded application, parse Web server reply, make an efficient number cruncher etc etc). I can rant more about how it should be organised but nobody reads documentation including me.

There’s still this annoyance with tuples as such too: why I can’t declare let foo, bar; if baz { foo = 4; bar = 2; } else { foo = bar = 0; } and have to use two separate lets? why I can’t have let (foo, bar); if baz { (foo, bar) = (4, 2); } else { (foo, bar) = (0,0); } either? In result while named tuples are there I end up using only unnamed tuples.

So while Rust offers some nice things it has not a very nice way to shape development. And this also explains why C was so popular and still is: it does not enforce any particular behaviour on you (except in recent editions when the standard and compilers suddenly started to care about arithmetic and bit operations being non-portable—you might make your own CPU that does not use two’s complement arithmetic after all), no enforced coding style, you can compile code in any order you like and interface almost anything without special tools or wrappers. And the freedom it offered along with effectiveness is what is often lacking in more modern languages (the saddest thing is that it’s traded not for memory security but rather for sacks of syntactic sugar).

Anyway, I’ll keep experimenting and we’ll see how things will turn out. In either case I should start thinking about splitting NihAV into several crates, registering codecs and such. Too much work, too many opportunities to procrastinate!

Some Notes on VivoActive Video

November 21st, 2017

When you refactor code (even if your own one) any other activity looks better. So I decided to look at VivoActive Video instead of refactoring H.263-based decoders in NihAV.

In case you don’t know, Vivo was a company that created own formats (container and video, no idea about audio) that seems to that old that its beard rivals the beard of its users. Also it’s some MPlayer-related joke but I never got it.

Anyway, it’s two H.263-based video codecs, one being vanilla H.263+ decoder will all exciting stuff like PB-frames (but no B-frames) and another one is an upgrade over it that’s still H.263+ but with different coding scheme.

Actually, how the codec handles coding is the only interesting thing there. First, codebooks. They are stored in semi-readable way: first entry may be an optional FLC marker, last entry is always End marker, the rest of entries are human-readable codes (e.g. 00 1101 11 — the codebook parser actually parses those ones and zeroes and skips white spaces) with some binary data (the number of trailing bits, symbol start value, something else too). The way how bitstream is handled reminds me of VPx somewhat: you have a set of 49 codebooks, you start decoding tokens from certain codebook and then if needed you switch to secondary codebook. In result you get a stream of tokens that may need to be parsed further (skip syncword prevention codes that decode to 0xB3, validate the decoded block—mind you, escape values are handled as normal codes there too, assign codes to proper fields etc etc). In result while it’s easy to figure out which part is H.263 picture/GOB/MB header decoding because of the familiar structure and get_bits() calls, Vivo v2 decoding looks like “decode a stream of tokens, save first ones to certain fields in context, interpret the rest of them depending on them”. For example, macroblock decoding starts with tokens for MB type, CBP and quantiser, those may be followed up by 1 or 4 motion vector deltas and then you have block coefficients (and don’t forget to skip stuffing codes when you get them).

Overall, not a very interesting codec with some crazy internal design (another fun fact: it has another set of codebooks in slightly different format but they seem to be completely unused). I’m not sure if it’s worth implementing but it was interesting to look at.

koda

November 9th, 2017

Dedicated to all young werehedgehogs.

xkcd.com/1882/ — one URL worth thousand words

So, let’s talk about colour in multimedia. To summarise it so you can skip the rest: proper colour representation hardly matters at all.

What is colour from physical point of view? It’s a property of light in visible range (i.e. between infrared and ultraviolet though some people are born without proper UV filters). Even better, you can clearly define it via spectroscopy because it’s a mix of certain wavelengths with certain energies. Another approach is to have reference colours printed on some surface (aka Pantone sets)—and that is the very thing you use to make sure you get what you want when taking a photo (especially on other celestial body) or ensuring consistency of production at typography.

The problem is that either approach is too bulky for use outside certain specific areas, for example it’s too expensive to store the whole spectrum for each pixel even in palette form (also image or video compression would be extremely inconvenient). Good thing is that our eye has its own variant of psychoacoustic masking and you can use several basic colours to achieve the mix. And from this most colour models (or spaces) were born where the range of real (aka present in spectrum) and perceivable colours (like purple or white, which are a mix of several colours) are represented as a composition of some primary colours like red+green+blue or cyan-magenta-yellow. And of course there is famous CIE 1931 model with basis being theoretical components corresponding to sensitivity of cone cells in human eye.

And there came the other problem: most colourspaces (XYZ, HSV and such) are as good as π-based computing system—it’s incredibly convenient for certain kinds of calculations but it’s next to impossible to convert results from and into decimal with good precision. Even RGB with its primary colours widely available has a problem: for instance, the colour of sky outside Britain (in case you didn’t know the etymology of word ‘sky’, it comes from Scandinavian word for cloud) can be represented only with IIRC red component being negative.

So how to deal with it? By mostly not caring as humans usually do. In places where higher colour reproduction fidelity is required (mostly typography) they simply use more primary colours. But overall humans don’t care much if the colours are slightly wrong. On one side, human brain has an internal auto-correction scheme for colour tint and white auto-balance (you might remember that optical illusion with seemingly red strawberries covered by green or blue tint with no pixels being actually red); on the other side each pair of human eyes is unique and perceives colours and shades differently. So if most people won’t agree about actual shade and would recognize picture anyway why bother at all (again, some specific areas excluded)?

So all those TV-related standards that define fine details of colour models are good only for mastering stuff (i.e. to keep consistency for the final product because you might not care about colour being slightly wrong but you’ll spot slight shade mismatch for sure). And speaking about TV-related standards, so-called TV-range (i.e. having component values fit into 16-240 range instead of 0-255 as you’d expect) is an archaism that should’ve been buried long time ago along with analogue TV broadcasting. But it still exists in digital world standards along with interlacing and KROPPING! not fully purged yet.

And speaking about shade differences, some of you might remember the era of VGA where each component actually had only 64 possible values and yet it was enough to create very convincing moving pictures. You may argue that the underlying issue was masked by palette mode I should point out that for rather long time after that people had to live with laptops and displays that had cheap LCDs with actual 18-bit colour depth (i.e. the same 6 bits per component as on VGA) as well (and let’s not talk about black colour representation there). So people didn’t care much about that and all this high-bitdepth stuff seems to be more of marketing creation than actual technical necessity (again, I understand that it’s needed somewhere like medical imaging, but common people don’t care about quality).

In the conclusion I want to say that the main reasons for introducing higher bitdepth wherever possible are: because we can (I understand and respect that), because it keeps many engineers and marketers employed (I understand that but don’t agree much) and because it helps fixing some other problems introduced elsewhere (like TV-range helped to deal with filtering artefacts—that I understand as well and try to respect but fail). Now be a good hedgehog and set proper colour profile in IMF metadata.

Dingo Pictures Works: Classics pt. 2 and Final Thoughts

November 5th, 2017

Sadly, all good things come to an end and this series review is no different. Let’s look at the last three entries before I give my opinion on all of them.
Read the rest of this entry »

Some Impressions on Czech Railways

November 5th, 2017

I’ve finally travelled enough Czech railways (mostly in the South-western part of the country) to form some impressions about them.

First, they have somewhat funny train terminology there: R means “rychlik” or express train while R-egional trains are marked as Os or “osobni” but in reality they all move with speed around 50 km/h.

Second, the rolling stock.
Typical locomotive
The trains are usually two-four carriages dragged by locomotive, most often like on the picture above. It brings nostalgia to me because it looks like a Škoda train from 1960s that was one of the best locomotives in the USSR, and it was also nicknamed Cheburashka because it both looked a bit like a titular hero of that anime (formerly Soviet cartoon) and featured there as well. You can also see rail buses, double-decker regional trains (the same as InterCity trains in Ukraine) and some other types but they are very rare.

Speaking of locomotives, I had a brief visit to Austria and saw their main locomotive ÖBB 1044. And what do you know, it looks like a replica of Rc-locomotive from Sweden. And then you read that Austrian Railways actually bought ten Rc2 from Sweden and designated them as ÖBB 1043 locomotives. Since Rc2 was the best locomotive in Austria it’s no wonder they’ve designed the next model after it.

Third, tickets. Outside Prague you can buy tickets usually just at ticket office at the station or maybe at conductors (but I’ve never tried that), ticket offices accept Euros and sometimes you can pay with a card too (mind the signs there). Another funny thing is that tickets usually contain the stations you should pass on your route and they’re a lot like German tickets for regional trains—you just buy a ticket for a route, which train you choose is up to you. Even better that in most cases you can buy tickets outside country, like I’ve bought ticket Praha-Tábor in Dresden.

Fourth, infrastructure in general. And that’s where it sucks.
A station somewhere between Jihlava and České Budějovice

Station houses look like they were built either in XIXth century under Austrian rule or in 1970s under Soviet rule (those look like featureless boxes essentially) and many of them are not very well maintained unfortunately. Another thing is platforms. You can see typical Czech platform on the first picture. They are often about just twice as high as rails and not particularly wide too, you can meet high platforms only on big stations and very random places (IIRC I’ve seen one at Velesín Město and there’s just a single track there).

And now for the tracks themselves. Rail connectivity is very good there so you can get from one place to another without going through Prague, the downside is that it usually takes two hours to get from one node to another as I’ve mentioned above all trains travel with the speed around 50 km/h. I’ve travelled on routes Dresden-Praha, Linz-Prag, Praha-Schwandorf, Tábor-Jihlava and Jihlava-Plzeň and looks like only routes from Prague to important places like České Budějovice, Plzeň and such are double-track (and to Dresden for some reason), the rest are single-track and often are curvy as they were drawn with a tail of stubborn mule as we say here. Also track Tábor-Horní Cerekev is quite bumpy and reminds more of a typical Ukrainian road than railway.

In general, Czech railways leave an impression of railways in rural area and thus they have their inimitable charm. Throw in a nostalgic feeling from the locomotives and you can say I liked it despite all downsides.

Dingo Pictures Works: Classics pt. 1

November 4th, 2017

Let’s look at last category of Dingo Pictures cartoons. Because it’s rather large I’ve split it into two parts.
Read the rest of this entry »

H.263 And MPEG-4 ASP—The Root of Some Evil

November 4th, 2017

As you might know (but still not care), I’m working on adding full RealMedia support for NihAV starting with video. So I’ve made it to decoding RealVideo 2 and I have some not so nice words to say about H.263 and MPEG-4 ASP.

First, the creeping featuritis in the standards: MPEG-4 part 2 from 2001 has A-O (the version from 2004 has only annexes A-M for some reason) while ITU H.263 (version from 2005) has annexes A-X plus two appendices. For comparison, ITU H.264 from 2017 has annexes A-J, same for MPEG-4 part 10 😉 Mind you, some annexes are for informative stuff (e.g. how an encoder should work or list of patent claims) but others add new coding features. So, for MPEG-4 part 2 (2001) we have 15 annexes, 7 of them are informative and only a couple of normative annexes add new features. For ITU H.263 out of 24 annexes about 15 are introducing new coding modes and other enhancements (different treating of motion vectors, loop filter, an alternative macroblock coding mode, PB-frame type and a lot more). The features are actually grouped into baseline(-ish) H.263 and H.263+.

Second, neither of them is really suitable for video coding. I know, it might sound strange, but either of these standards makes an unholy mix of various codecs. H.263 mixes several codecs from different generations together (initial H.263 did not have B-frames, later they’ve added PB-frames and finally B-frames too, there are at least two different ways to code macroblocks etc etc), MPEG-4 part 2 is for coding 3D video that actually also specifies a method to code video texture on those 3D shapes (there are no actual frames there, just VOPs—Video Object Planes). And yet, because the compression methods there provided an improvement over H.262 (aka MPEG-2 Video), they were used in various forms with various hacks in many multimedia formats. There we have a very wide gamut from RealVideo 1 and Sorenson Spark (aka FLV1) with just I- and P-frames to Intel I.263 that had PB-frames to RealVideo 2 with many features of H.263+ (including B-frames) to M$ MPEG-4 decoders to WMV2.

And here we have the problem: both format grew from the joint effort known as H.262 or MPEG-2 Video so obviously it was a good idea to abuse the same decoder structure to handle all possible variations of H.263 and video texture coding from MPEG-4 part 2 and then add all decoder-specific hacks. And in result you get a mess that’s hard to comprehend because it usually depends on many various context variables set in a specific manner for a specific codec. Hence the post title.

To demonstrate this I’ll show how the same feature is handled in different H.263/MP4p2-based codecs.

Sequence and frame headers

Obviously it differs for every codec. Some rely on container-provided width and height, some have dimensions coded for GOP or for individual frames, some codecs have only meaningful bits in the frame header, others store all feature bits and error out on unsupported configurations.

Frame types

  • Intel I.263: I, P, PB
  • RealVideo 1: I, P
  • RealVideo 2: I, P, B
  • Sorenson Spark: I, P, droppable P
  • WMV1: I, P
  • WMV2: I, P, X8(alternative I-frame coding)
  • H.263 in general: I, P, PB, B, EI, EP (last two are enhancement layer picture types for scalable coding)
  • MPEG-4: I, P, B and S (last one is sprite-coded picture)

Block coding

  • Intel I.263: H.263 codes
  • RealVideo 1: H.263 codes with a special codes for I-frame DCs
  • RealVideo 2: H.263+ AIC mode (advanced I-frame coding) plus H.263 P- or B-frames
  • Sorenson Spark: H.263 codes with a custom handling of AC escapes
  • WMV1/2: M$MPEG-4 codes

Motion vectors reconstruction

  • H.263: simply add predictor vector
  • H.263 UMV: depending on predictor value and difference range wrap it or not (see ITU H.263 D.2 for proper explanation)
  • MPEG-4: if (mv < low) mv += range; if (mv > high) mv -= range;
  • M$MPEG-4: if (mv < = -64) mv += 64; if (mv >= 64) mv -= 64;

(And there are different ways to predict motion vectors too!)

There are even more quirks than I listed here but it should give you an idea what a fine mess these formats are and why the code that supports them all tends to turn into huge mess. I tried to solve it in NihAV by having a template decoder for H.263 that calls bitstream parser for actual codec-specific parsing and keep some quirks inside specific structures (like MV that adds vectors differently depending on current mode) I still have more features to take into account (like slices, AC prediction and B-frames) so I’ll have to redesign it before I can support RealVideo 2 properly.

But then maybe I’ll add Vivo Media format support for the old times sake (it’s the funniest one with codebooks stored as strings of ones and zeroes like “0000 0011 110” inside the binary with “End” signalling the codebook end).