KBS 743

August 1st, 2014

I’ve not written anything about one of the crucial topics of this blog since ages, so here’s the long-awaited update.

Today I’d like to talk about probably the most interesting railway in Germany — Wutachtalbahn or Kursbuchstrecke 743 (Waldshut-Immendingen). It was build as a route to the South border of Germany that does not go on Swiss territory (the line along the Rhine it connects to goes through Basel and canton Schaffhausen).

Now, what makes it so interesting?

Despite being rather unimportant line nowadays and being about only 60km long (and there are no branches either!), it is operated by three different rail companies:

  1. northern part (Immendingen — Blumberg-Zollhaus) is operated by SWEG
  2. central part (Blumberg-Zollhaus — Weizen) is operated by WTB
  3. southern part (Weizen — Lauchringen — Waldshut) is operated by DB

Plan of the central part from Wickedpedia
(Image shamelessly stolen from Wickedpedia)

So you have three different companies running trains on approximately 20km tracks. Is it the same rolling stock? Of course not!

SWEG runs class 650 (aka Stadler RS1) diesel unit, Deutsche Bahn employs class 641 diesel unit and WTB runs a steam locomotive (Württembergische T.14 or class 52.80 or something similar) with bunch of outdated carriages from various places (like Switzerland).

And for unknown reason it’s nicknamed “Pig’s Tail Railway” (see the map above, I have no clue why) and the name somehow appeals to me.

I’ve visited it in three parts: one year I saw the middle part, next year I saw the north parth and later I saw the last part too. Curiously, while DB runs the most modern train the route itself seems the most outdated: the rails are uneven so you can get a bit seasick, the signal system is implemented by driver’s assistant with a red flag who stands on the crossing while the train passes it and it does not stand on the Weizen station for long because it has to give room to the WTB train (in result it comes to the station, waits a bit and cowardly retreats back to the track and waits there till the WTB train is gone).

In general I’d recommend visiting it if you happen to be there. If you want to see something better — go to Sweden and try Uppsala-Lenna railway, it’s the best (now I want to visit it again — oh wait, I wanted that before too).

On Some Annoying Audio Codecs Family

July 29th, 2014

For the reasons I can’t disclose I really hate DTS codecs. For those who don’t know there are about three and a half codecs in this family:

  • DTS Core
  • DTS Core extensions (bitrate extension, two extensions for more channels and an extension for upsampling e.g. 48 kHz -> 96 kHz)
  • DTS Lossless (which might depend on core and extend/replace its channels)
  • DTS LBR (aka Express profile)

You need to be Jean-Baptiste Kempf to love these formats: DTS Core uses annoyingly large tables, DTS Lossless relies on DTS Core part being decoded properly for it, DTS LBR is a special beast that I’ll describe below. And the best part — all those formats are poorly documented (tables are missing for DTS Core, something was missing for DTS Core X96k extension too, bitexact core reconstruction and some other things needed for real lossless decoder implementation are not documented, LBR is not much better either).

So, what makes DTS LBR special? Its coding mode of course. This is a weird codec that employs MDCT (nothing special so far), codes tones separately (that’s not so common) and spreads it all among many chunks for different resolutions that make it “scalable” or whatever.

Nevertheless this post is not about how horrible are all those codecs (if you have ever worked with them it’s obvious and Jean-Baptiste Kempf won’t believe anyway), it’s about obscure relations with other codecs.

When I looked at QDesign Music codec (unsupported by Libav currently) I found that it has suspiciously familiar coding scheme for tones (QDesign Music 1/2 also use tone detection in MDCT frames) — I’ve seen it in DTS LBR. And indeed, it seems the same guy created some codec called LBpack that was first to use that approach, then he was employed by QDesign and then by DTS. No wonder it looked similar.

Another piece of trivia — there was one guy working on so-called adaptive prediction and transform scheme. Later the prototype known as APT100 was turned into DTS Core. But looks like the same work gave birth to lesser-known codec APT-X (that I’m currently REing but that’s beside the point). And it’s not just the name — one codec employs QMF and ADPCM on subbands, another one employs QMF and optional ADPCM on subbands.

All that makes one wonder whether DTS Lossless is related to some lossless codec outside DTS (not necessarily APT Lossless but might be, no details are known about that one). Currently I cannot name any other lossless codec that employs the same coding approach (block coding with different coding for large and small coefficients plus non-adaptive filter). Of course such knowledge won’t change anything but it would be still interesting to know.

P.S. There are rumours that DTS LBR will be made scalable for adaptive streaming, what a fun that will be!
P.P.S. This post was written mainly to test how well new Mike’s setup works.

Bink2: Giving Up

June 17th, 2014

Short story: boring.

As I wrote more than a year ago the codec itself is rather simple and not that interesting (rather simple VLC coding plus DCT). Version KB2f was floating point, KB2g­—KB2i have switched to 16-bit integers wherever possible (including DCT, see links below). And I don’t really have content that I’d like to watch and unlikely to ever have it since my favourite games were released mostly before 2000 (that reminds me of Discworld Noir BMV format to RE…).

So here’s the list of what I’ve not done so far:

  • there are bugs in bitstream parsing, especially for KB2f;
  • I was too lazy to add all integer tables for newer versions, so it still uses floating point code for older version;
  • DC prediction (it uses median prediction with weird neighbours selection);
  • MV prediction;
  • motion compensation functions;
  • loop filtering;
  • definitely more of issues to resolve.

I’ll put known details to MultimediaWiki later.

All essential details about Bink 2g DCT are given here and here.

For those interested compressed patch is here.

P.S. I reserve right to look at newer versions if they appear and complain about them being equally boring.

On Pop Music

June 13th, 2014

Here’s the only pop song in the last couple of years. It has it all—catchy rhythm and meaningful words. And a bit of nostalgia too.

I hesitated for a long time if it’s worthy blogging about and finally decided it is.

Why I Shan’t Design a New Format

May 16th, 2014

Time from time I’m asked that question and since people can’t see why I’m not going to design a new format (even though the reasons are obvious) here’s the answer. Format in this context means both codec and container.

There are too many of them already. And they suck in different ways. And I believe it’s impossible to make format that will appeal to everybody so it will suck in some aspect. Either it will lack some features or will be too extensible that it will impose too much complexity on implementation. Lossless codecs are often written in such way that they require a special container because not even Matroska can encapsulate them properly. Lossless video codecs all offer about the same compression level and it’s law of diminishing returns in action (exponentially more time on compression yields only single percent of compression gain at best). Intermediate codecs sacrifice compression gains to speed. Advanced codecs are often some standard ripoffs (e.g. if progression keeps, VP11 will be based on H.266 but with multiple alternative reference frames and their peculiar binary coder). Containers suck either at complexity, compliance or flexibility. And there’s Ogg.

It is hard to write good tools for it. I have written some encoders and what I have:

  1. Zip Motion Blocks Video encoder (palettised) — I might be the only user;
  2. IMA ADPCM QT encoder — noone cares;
  3. M$ Video 1 encoder — got a nice review in 2009 and was merged as is into FFmpeg in 2011 just because. Probably noone cares about it either;
  4. AAC bitstream writer — it sucked so much that many talented people who tried to improve it afterwards just gave up and never returned to it again;
  5. ProRes encoder — for some reason it become popular and made me realize that noone caring about your encoder is a good thing.

Writing an encoder for a new format requires a lot of testing and tuning (especially for audio) and that requires both hardware and time which I lack. I had enough fun with AAC.

It is very hard to get adoption for the format. See previous two items. You should have good tools to interest users and there are too many different formats already to compete with. These are not the times when people were so desperate that they’d accept anything that was opensource and somewhat fulfilled their wishes (like Vorbis despite it being not hardware decoders-friendly and bundled with Ogg, or Matroska despite it being Matroska).

So, I shan’t develop a new format because it will take a lot of time from me with extremely little chances that results of that work will be ever used. Pity that lossless codecs creators didn’t think about it.

Bink2 Again

May 13th, 2014

So in some previous posts I’ve described Bink2 bitstream. What’s the catch? It applies only to KB2f files, newer versions (KB2g­—KB2i) use different bitstream format:

  • Block types are coded as an index in LRU list with an unary code;
  • One quantiser for the whole macroblock instead of individual per-component quantisers;
  • New CBP coding method for luma (it depends on number of bits set in the previous CBP and such)
  • DCs are coded as unary codes + additional bits instead of fixed bits + additional bits
  • AC coding now has skips coded before coefficients (previously first coefficient was always coded).

Implementations welcome.

Utilite

April 13th, 2014

Finally I’ve found some time to play with i.MX6-based Utilite which I intended to use as a home box for various stuff (like running fetchmail, irssi, simple web server etc. — in other words not desktop). So here’s a quick review:

  • does not work with my display (1920×1200, DVI input)
  • does not allow logging in via SSH (it refuses passwords) and the same problem with sudo later
  • does not have IPv6 enabled (not a grave problem but my provider has moved to IPv6 already)
  • serial port works as good as telegraph in magnetic storm (honestly, it gives all type of characters on terminal except the ones you can read let alone want, typing one character per minute is somewhat better)

I might be really old but this is not a development board (at least it’s positioned by desktop) so I expect it to work. And unlike previous product by the same company one cannot blame it on hardware — it’s i.MX6, not Tegra2.

So I’ve ordered Cubietruck already (I have Cubieboard2 at work and it has been running fine right from the start).

P.S. Raspberry Pi can go to hell.

A Bit More about UnclearVideo

April 12th, 2014

So, in the course of cleaning-up program I’ve looked at ClearVideo again.

Intraframes decoding has been REd long time ago—it’s merely DCT-coded blocks. They happen quite often and it’s possible to watch it somehow.

Interframes coding is what makes this codec unique. And if you expected real fractal coding then you’re wrong (and that’s expected—processing power needed to restore even 320×240 frame from seed by applying iterations might be too much even for today). Instead you have quadtree with square blocks. Do you know what information it stores there? Subtree flags (i.e. if you need to divide it further), motion vector and brightness control. So the block is copied and its brightness might be adjusted too, that’s all. There’s no residue coding. And tree information can be coded in four different ways (I’ve seen four out of about six different tree decoding functions being employed), properties are coded with variable-length codes that I fear may be autogenerated.

So what prevents me from finishing it? A very horrible design. It’s written in Very Objective C++ dialect. Every piece of code is wrapped into its own class with possible overloading, so instead of calling a function you retrieve class pointer from some subclass of current context and invoke function from vtable there. Which will not be the real function but rather a thunk that will jump to the real function (even better, some functions calling themselves do it through that thunk too). And yet they use global variables! This is impossible to analyse with static analysis (like I mostly do), it’s nightmare for debugger as well (but a hack to MPlayer loader that displayed calls and indirect jumps was very useful here) and I suspect it won’t get much better even if I get original source code (which I’ll never get of course).

A small example of design: tree decoding uses several GetBitContexts stored in more generic TreeDecodingContext, which is a subclass to FrameType2 context, which is a subclass of DecoderContext. And I’m pretty sure I forgot some levels of indirections in-between.

Time to give up? Time to give up.

A Bit More on Security

March 27th, 2014

This is a translation of this page by unknown author. It’s rather old but recently I remembered it for some reason and decided to share.


Day One

A hacker comes to a diner and sees that salt shaker can be opened by anyone and anything can be put inside. The hacker comes home and writes a letter to the diner manager: “I, meG@Duc, have found a salt shaker vulnerability in your diner. A malicious person can open it and put poison there! Please fix it!”

Day Two

Diner manager gets that mail along with other correspondence and shrugs: “What an imagination”

Day Five

The hacker comes to diner and puts some poison into every salt shaker. Three hundred people are dead, a criminal case against manager is closed after three months because there was no crime from his side. The hacker writes a letter “see now?”.

Day Ninety Six

Manager orders special salt shakers with a combination lock. Diner guests feel that they don’t get something.

Day Ninety Seven

Hacker discovers that holes in salt shaker pass salt in both direction (and other substances too). He writes a letter and pisses into ever salt shaker. Three hundred people don’t come there ever again, thirty people went to hospital with poisoning. Hacker sends manager a SMS “How d’ya like it?”. Manager spends three months being interrogated and a year on probation.

Day 188

Manager vows never to work at any diner ever again and be a lumberjack instead. Engineers are working on one-way salt shaker design. Meanwhile waiters remove all old salt shakers and give salt on demand.

Day 190

The hacker steals a salt shaker and researches it at home. Then he writes a new letter to the manager: “I, meG@Duc, stole a salt shaker and find this outraging! Anyone can steal a salt shaker from your diner!” So far abstinent manager goes home and drinks vodka.

Day 193

The hacker discovers that all salt shakers are chained to the tables. He talks about his achievements at the next hacker conference and receives an award for protecting society and customers’ needs. At least manages doesn’t find this out.

Day 194

All hackers from the conference make a devious plan. They go to the diner and take all salt from shakers. meG@Duc then writes another complaint about low customer service and that anyone can deprive everyone else of salt.

Thus a new salt shaker design is needed. Engineers are working on it while waiters still give salt on demand. Manages goes abroad and uses room service only — no cafes, bars or restaurants.

Day 200

Customers discover that in order to get salt they have to call waiter, show their ID and get special 8-digit one-time code for a salt shaker. Repeat the same for pepper.

All Containers Suck

March 25th, 2014

It’s pretty obvious but I got requests to write this nevertheless.

All known containers suck, some of them suck gloriously, some of them plainly suck. And there’s Ogg Matroska Ogg.

There are several features that distinguish container usefulness:

  • flexibility (supporting various codecs and number of streams);
  • easy to parse;
  • well-defined specification (there must be a format with such thing);
  • metadata support;
  • low overhead (bytes needed to define frame size and other properties);
  • advanced features for insane people.

Now let’s review containers grouped by design.

Raw or raw with header. Those are the simplest and codec-specific. Besides being designed (usually) for only one stream and one codec, they often decide to save bits on frames and in result you have hard time implementing seeking (say hello to FLAC or Moosepack SV7). Some have seek table at least (old Monkey’s Audio has two — for byte and bit position).

Your favourite FLV belongs to this category — it has one audio and one video stream with no headers (and that’s why it has its own flavour of VP6 with frame dimensions stored at every frame) though one can abuse it to add a data stream. And of course some Chinese used it to store HEVC too in the stupidest way possible (for starters they have introduced half a dozen of different video codec IDs for it).

Chunk-based. The most popular category that refuses to go away. The best representative is RIFF (M$ ripoff of EA IFF format, there are many specific RIFF variants known — AVI, RMF, WAV, WebP) and runner-up is MOV/MP4. AVI is verily the pinnacle — flexible, extensible, every frame is its own chunk. What can go wrong with it? The usual thing: abuse. Too many idiots implemented their own AVI writers with whatever bugs they could introduce and it got even worse when codecs started to employ B-frames. Intel worked around by adding combined I+B-frame and dummy frame afterwards so decoder would handle it internally (you can see it both in Indeo 4 and their I.263). DiVX on the other tentacle… And variable framerate is not for AVI either (unless you simply use zero frames to define skips).

As for MOV/MP4 there seems to be a problem with parsing custom atoms (there are too many atom types around). And of course you have nice abuse like ASF packets stored inside MOV packets if you use Flip4Mac.

And if you replace chunks with an unholy mix of tags and UIDs you get MXF. That format doesn’t have a specification but rather a swarm of them so you don’t know which ones you’ll need to demux some file.

There’s NUT — probably the only format out there with two specifications and three or four implementations, each not agreeing with all other.

MPEG-TS inspired. MPEG-TS is one of overengineered container formats that nothing in this world would be able to demux a TS file with all possible features. And forget about seeking (unless you have an external index or build index yourself).

Of course such design inspired a lot of other formats that have some features of it but often those features are used without understanding why they are there. But result is good for streaming!!!1one

There’s ASF with crazy GUIDs for everything and fixed packet size (which means there’s no direct correspondence between ASF packet and stream packet anymore).

And there’s Ogg. Read this if you still haven’t.

Matroska. That’s a cancer — when you design container that should be able to contain everything and support any feature possible and it gets out of control you get Matroska. It’s based on binary XML, it can have any feature. And it stores every codec in its unique way — see what they call codec specs. So they save bytes here and there and demuxer should put them back, which is not nice, especially if you believe that demuxers and decoders do not need to know about each other.


If you wonder why I haven’t mentioned RealMedia, it’s because this format is an unholy mix of all categories:

  • Old RealAudio is rather simple raw + header;
  • RealMedia in general is chunk-based format (with a hack for B-frames even).
  • Video frame can be split into several packets or several frames can be merged into single packet, a lot like MPEG-TS inspired formats.
  • And they had mangled audio streams long before Matroska was here. Actually only some audio codecs data is stored as is, the rest is XORed or has permuted subpackets.