Archive for November, 2017

Some Notes on VivoActive Video

Tuesday, November 21st, 2017

When you refactor code (even if your own one) any other activity looks better. So I decided to look at VivoActive Video instead of refactoring H.263-based decoders in NihAV.

In case you don’t know, Vivo was a company that created own formats (container and video, no idea about audio) that seems to that old that its beard rivals the beard of its users. Also it’s some MPlayer-related joke but I never got it.

Anyway, it’s two H.263-based video codecs, one being vanilla H.263+ decoder will all exciting stuff like PB-frames (but no B-frames) and another one is an upgrade over it that’s still H.263+ but with different coding scheme.

Actually, how the codec handles coding is the only interesting thing there. First, codebooks. They are stored in semi-readable way: first entry may be an optional FLC marker, last entry is always End marker, the rest of entries are human-readable codes (e.g. 00 1101 11 — the codebook parser actually parses those ones and zeroes and skips white spaces) with some binary data (the number of trailing bits, symbol start value, something else too). The way how bitstream is handled reminds me of VPx somewhat: you have a set of 49 codebooks, you start decoding tokens from certain codebook and then if needed you switch to secondary codebook. In result you get a stream of tokens that may need to be parsed further (skip syncword prevention codes that decode to 0xB3, validate the decoded block—mind you, escape values are handled as normal codes there too, assign codes to proper fields etc etc). In result while it’s easy to figure out which part is H.263 picture/GOB/MB header decoding because of the familiar structure and get_bits() calls, Vivo v2 decoding looks like “decode a stream of tokens, save first ones to certain fields in context, interpret the rest of them depending on them”. For example, macroblock decoding starts with tokens for MB type, CBP and quantiser, those may be followed up by 1 or 4 motion vector deltas and then you have block coefficients (and don’t forget to skip stuffing codes when you get them).

Overall, not a very interesting codec with some crazy internal design (another fun fact: it has another set of codebooks in slightly different format but they seem to be completely unused). I’m not sure if it’s worth implementing but it was interesting to look at.


Thursday, November 9th, 2017

Dedicated to all young werehedgehogs. — one URL worth thousand words

So, let’s talk about colour in multimedia. To summarise it so you can skip the rest: proper colour representation hardly matters at all.

What is colour from physical point of view? It’s a property of light in visible range (i.e. between infrared and ultraviolet though some people are born without proper UV filters). Even better, you can clearly define it via spectroscopy because it’s a mix of certain wavelengths with certain energies. Another approach is to have reference colours printed on some surface (aka Pantone sets)—and that is the very thing you use to make sure you get what you want when taking a photo (especially on other celestial body) or ensuring consistency of production at typography.

The problem is that either approach is too bulky for use outside certain specific areas, for example it’s too expensive to store the whole spectrum for each pixel even in palette form (also image or video compression would be extremely inconvenient). Good thing is that our eye has its own variant of psychoacoustic masking and you can use several basic colours to achieve the mix. And from this most colour models (or spaces) were born where the range of real (aka present in spectrum) and perceivable colours (like purple or white, which are a mix of several colours) are represented as a composition of some primary colours like red+green+blue or cyan-magenta-yellow. And of course there is famous CIE 1931 model with basis being theoretical components corresponding to sensitivity of cone cells in human eye.

And there came the other problem: most colourspaces (XYZ, HSV and such) are as good as π-based computing system—it’s incredibly convenient for certain kinds of calculations but it’s next to impossible to convert results from and into decimal with good precision. Even RGB with its primary colours widely available has a problem: for instance, the colour of sky outside Britain (in case you didn’t know the etymology of word ‘sky’, it comes from Scandinavian word for cloud) can be represented only with IIRC red component being negative.

So how to deal with it? By mostly not caring as humans usually do. In places where higher colour reproduction fidelity is required (mostly typography) they simply use more primary colours. But overall humans don’t care much if the colours are slightly wrong. On one side, human brain has an internal auto-correction scheme for colour tint and white auto-balance (you might remember that optical illusion with seemingly red strawberries covered by green or blue tint with no pixels being actually red); on the other side each pair of human eyes is unique and perceives colours and shades differently. So if most people won’t agree about actual shade and would recognize picture anyway why bother at all (again, some specific areas excluded)?

So all those TV-related standards that define fine details of colour models are good only for mastering stuff (i.e. to keep consistency for the final product because you might not care about colour being slightly wrong but you’ll spot slight shade mismatch for sure). And speaking about TV-related standards, so-called TV-range (i.e. having component values fit into 16-240 range instead of 0-255 as you’d expect) is an archaism that should’ve been buried long time ago along with analogue TV broadcasting. But it still exists in digital world standards along with interlacing and KROPPING! not fully purged yet.

And speaking about shade differences, some of you might remember the era of VGA where each component actually had only 64 possible values and yet it was enough to create very convincing moving pictures. You may argue that the underlying issue was masked by palette mode I should point out that for rather long time after that people had to live with laptops and displays that had cheap LCDs with actual 18-bit colour depth (i.e. the same 6 bits per component as on VGA) as well (and let’s not talk about black colour representation there). So people didn’t care much about that and all this high-bitdepth stuff seems to be more of marketing creation than actual technical necessity (again, I understand that it’s needed somewhere like medical imaging, but common people don’t care about quality).

In the conclusion I want to say that the main reasons for introducing higher bitdepth wherever possible are: because we can (I understand and respect that), because it keeps many engineers and marketers employed (I understand that but don’t agree much) and because it helps fixing some other problems introduced elsewhere (like TV-range helped to deal with filtering artefacts—that I understand as well and try to respect but fail). Now be a good hedgehog and set proper colour profile in IMF metadata.

Dingo Pictures Works: Classics pt. 2 and Final Thoughts

Sunday, November 5th, 2017

Sadly, all good things come to an end and this series review is no different. Let’s look at the last three entries before I give my opinion on all of them.

Some Impressions on Czech Railways

Sunday, November 5th, 2017

I’ve finally travelled enough Czech railways (mostly in the South-western part of the country) to form some impressions about them.

First, they have somewhat funny train terminology there: R means “rychlik” or express train while R-egional trains are marked as Os or “osobni” but in reality they all move with speed around 50 km/h.

Second, the rolling stock.
Typical locomotive
The trains are usually two-four carriages dragged by locomotive, most often like on the picture above. It brings nostalgia to me because it looks like a Škoda train from 1960s that was one of the best locomotives in the USSR, and it was also nicknamed Cheburashka because it both looked a bit like a titular hero of that anime (formerly Soviet cartoon) and featured there as well. You can also see rail buses, double-decker regional trains (the same as InterCity trains in Ukraine) and some other types but they are very rare.

Speaking of locomotives, I had a brief visit to Austria and saw their main locomotive ÖBB 1044. And what do you know, it looks like a replica of Rc-locomotive from Sweden. And then you read that Austrian Railways actually bought ten Rc2 from Sweden and designated them as ÖBB 1043 locomotives. Since Rc2 was the best locomotive in Austria it’s no wonder they’ve designed the next model after it.

Third, tickets. Outside Prague you can buy tickets usually just at ticket office at the station or maybe at conductors (but I’ve never tried that), ticket offices accept Euros and sometimes you can pay with a card too (mind the signs there). Another funny thing is that tickets usually contain the stations you should pass on your route and they’re a lot like German tickets for regional trains—you just buy a ticket for a route, which train you choose is up to you. Even better that in most cases you can buy tickets outside country, like I’ve bought ticket Praha-Tábor in Dresden.

Fourth, infrastructure in general. And that’s where it sucks.
A station somewhere between Jihlava and České Budějovice

Station houses look like they were built either in XIXth century under Austrian rule or in 1970s under Soviet rule (those look like featureless boxes essentially) and many of them are not very well maintained unfortunately. Another thing is platforms. You can see typical Czech platform on the first picture. They are often about just twice as high as rails and not particularly wide too, you can meet high platforms only on big stations and very random places (IIRC I’ve seen one at Velesín Město and there’s just a single track there).

And now for the tracks themselves. Rail connectivity is very good there so you can get from one place to another without going through Prague, the downside is that it usually takes two hours to get from one node to another as I’ve mentioned above all trains travel with the speed around 50 km/h. I’ve travelled on routes Dresden-Praha, Linz-Prag, Praha-Schwandorf, Tábor-Jihlava and Jihlava-Plzeň and looks like only routes from Prague to important places like České Budějovice, Plzeň and such are double-track (and to Dresden for some reason), the rest are single-track and often are curvy as they were drawn with a tail of stubborn mule as we say here. Also track Tábor-Horní Cerekev is quite bumpy and reminds more of a typical Ukrainian road than railway.

In general, Czech railways leave an impression of railways in rural area and thus they have their inimitable charm. Throw in a nostalgic feeling from the locomotives and you can say I liked it despite all downsides.

Dingo Pictures Works: Classics pt. 1

Saturday, November 4th, 2017

Let’s look at last category of Dingo Pictures cartoons. Because it’s rather large I’ve split it into two parts.

H.263 And MPEG-4 ASP—The Root of Some Evil

Saturday, November 4th, 2017

As you might know (but still not care), I’m working on adding full RealMedia support for NihAV starting with video. So I’ve made it to decoding RealVideo 2 and I have some not so nice words to say about H.263 and MPEG-4 ASP.

First, the creeping featuritis in the standards: MPEG-4 part 2 from 2001 has A-O (the version from 2004 has only annexes A-M for some reason) while ITU H.263 (version from 2005) has annexes A-X plus two appendices. For comparison, ITU H.264 from 2017 has annexes A-J, same for MPEG-4 part 10 😉 Mind you, some annexes are for informative stuff (e.g. how an encoder should work or list of patent claims) but others add new coding features. So, for MPEG-4 part 2 (2001) we have 15 annexes, 7 of them are informative and only a couple of normative annexes add new features. For ITU H.263 out of 24 annexes about 15 are introducing new coding modes and other enhancements (different treating of motion vectors, loop filter, an alternative macroblock coding mode, PB-frame type and a lot more). The features are actually grouped into baseline(-ish) H.263 and H.263+.

Second, neither of them is really suitable for video coding. I know, it might sound strange, but either of these standards makes an unholy mix of various codecs. H.263 mixes several codecs from different generations together (initial H.263 did not have B-frames, later they’ve added PB-frames and finally B-frames too, there are at least two different ways to code macroblocks etc etc), MPEG-4 part 2 is for coding 3D video that actually also specifies a method to code video texture on those 3D shapes (there are no actual frames there, just VOPs—Video Object Planes). And yet, because the compression methods there provided an improvement over H.262 (aka MPEG-2 Video), they were used in various forms with various hacks in many multimedia formats. There we have a very wide gamut from RealVideo 1 and Sorenson Spark (aka FLV1) with just I- and P-frames to Intel I.263 that had PB-frames to RealVideo 2 with many features of H.263+ (including B-frames) to M$ MPEG-4 decoders to WMV2.

And here we have the problem: both format grew from the joint effort known as H.262 or MPEG-2 Video so obviously it was a good idea to abuse the same decoder structure to handle all possible variations of H.263 and video texture coding from MPEG-4 part 2 and then add all decoder-specific hacks. And in result you get a mess that’s hard to comprehend because it usually depends on many various context variables set in a specific manner for a specific codec. Hence the post title.

To demonstrate this I’ll show how the same feature is handled in different H.263/MP4p2-based codecs.

Sequence and frame headers

Obviously it differs for every codec. Some rely on container-provided width and height, some have dimensions coded for GOP or for individual frames, some codecs have only meaningful bits in the frame header, others store all feature bits and error out on unsupported configurations.

Frame types

  • Intel I.263: I, P, PB
  • RealVideo 1: I, P
  • RealVideo 2: I, P, B
  • Sorenson Spark: I, P, droppable P
  • WMV1: I, P
  • WMV2: I, P, X8(alternative I-frame coding)
  • H.263 in general: I, P, PB, B, EI, EP (last two are enhancement layer picture types for scalable coding)
  • MPEG-4: I, P, B and S (last one is sprite-coded picture)

Block coding

  • Intel I.263: H.263 codes
  • RealVideo 1: H.263 codes with a special codes for I-frame DCs
  • RealVideo 2: H.263+ AIC mode (advanced I-frame coding) plus H.263 P- or B-frames
  • Sorenson Spark: H.263 codes with a custom handling of AC escapes
  • WMV1/2: M$MPEG-4 codes

Motion vectors reconstruction

  • H.263: simply add predictor vector
  • H.263 UMV: depending on predictor value and difference range wrap it or not (see ITU H.263 D.2 for proper explanation)
  • MPEG-4: if (mv < low) mv += range; if (mv > high) mv -= range;
  • M$MPEG-4: if (mv < = -64) mv += 64; if (mv >= 64) mv -= 64;

(And there are different ways to predict motion vectors too!)

There are even more quirks than I listed here but it should give you an idea what a fine mess these formats are and why the code that supports them all tends to turn into huge mess. I tried to solve it in NihAV by having a template decoder for H.263 that calls bitstream parser for actual codec-specific parsing and keep some quirks inside specific structures (like MV that adds vectors differently depending on current mode) I still have more features to take into account (like slices, AC prediction and B-frames) so I’ll have to redesign it before I can support RealVideo 2 properly.

But then maybe I’ll add Vivo Media format support for the old times sake (it’s the funniest one with codebooks stored as strings of ones and zeroes like “0000 0011 110” inside the binary with “End” signalling the codebook end).