X8intra is there!

November 10th, 2007

Now we have X8intra frames support in ffmpeg! Mike has already expressed his joy in his blog. I think anime fans who tried to play WMV3 in AVI would also be glad.

Why this has not been done earlier? Well, sheer disgust. The person who invented this scheme should be either fired or promoted to M$ CTO. Here is the list of reasons:

  • It is used for some keyframes coding so it cannot be skipped
  • Design is totally unlike anything standard
  • You should perform bitexact decoding. If your DCT produces slightly different results or you forgot about loop filtering then you won’t be able to decode picture properly. Hey, that’s utter crap by _any_ standard
  • It has made it to WMV3 too.

To put it mildly, X8Intra is an illegal offspring of JPEG and some early H.264 draft. It is mainly Huffman-coded 8×8 DCT-transformed blocks with spatial prediction and loop filtering. It does not have macroblocks like decent codecs. Spatial prediction has some directions and relies on previously decoded blocks and bits read depending on that. I have not seen anything with such a bad bitstream-transform dependency (in other codecs you can decode coefficients first and then perform image reconstructions but not here). X8Intra excrementum est(pardon my Latin).

Still I very grateful to the person named “someone” who did this. Otherwise I should clean this cesspool.

Multimedia-unrelated news

November 9th, 2007

I just had to post this – our Philharmonic presented own harpsichord. Several years ago when I first visited it I’ve listened to concert music with harpsichord but it was borrowed one and there was nothing comparable since that time.

The presentation went well and we were enjoying different music – from sonatas by Handel and Telemann, Johann Sebastian Bach concertos to Mozart to jazz improvisations and modern Ukrainian music (well, when composer plays on one of the instruments himself I consider it modern).

Looking forward for further listening (with hope that it would take less than a couple of years of waiting).

A Book on Multimedia

November 8th, 2007

I presume those interested in multimedia coding have heard of “Data Compression: The Complete Reference” by David Salomon. Personally I consider this book very good but maybe we should write our own book concentrating on multimedia only. Why? I have not seen books where video (and audio) compression is not merely outlined (like in most books on general data compression) or is not solely dedicated to some standard
(MPEG usually).

I gladly remember this book as it’s quite outdated but at least it covers many codec, container and even implementation issues (unfortunately, sound only))!

My proposal for book outline:

  • General multimedia concept (pixels, samples, PCM, DCT)
  • Audio compression
    • Simple time-domain codecs (DPCM, ADPCM)
    • Complex time-domain codecs (lossless mostly)
    • Speech codecs
    • MDCT-based codecs and friends
    • How to write a fast decoder and good encoder (or otherwise)
  • Image Compression
  • Video compression
    • Lossless coding
    • Game video codecs (who will write this?)
    • Modern standard and non-standard codecs
    • Implementation tips and tricks
    • Known codecs (implementation-wise) overview
  • Containers
    • Why making codecs dependent on custom container is idiotic 🙂
    • File-based containers
    • Streaming containers

And I know where to get information ;-). Well, let’s see if this catches up.

Audiovisual debugger

November 4th, 2007

I have never though about FFplay in that way but it had struck me today that waveform visual display is one of the best ways to debug it.
Why?


FFplay

One of C.P.E. Bach’s Wurttemberg sonatas (a small excerpt, really)

Because it gives you those advantages:

  1. Noise hurts your eyes less than ears
  2. Some inaudible artifacts (like DC bias) are easily spottable
  3. Clipping and volume change is easily spottable too
  4. Stereo differences are easy to find
  5. It may give you some aesthetic pleasure 😉

I must also add that most audio player have visualizers but they lack simplicity and usability of this 640×480 clean waveform rendering.

Another game format

October 17th, 2007

I remember playing game “Lost Vikings” by Silicon&Synapse (who was later renamed to Blizzard).


Lost Vikings 2
Lost vikings (again) and additional creatures. Endgame scene.

Who knew that it had a sequel? It was done by Beam Software and looks less appealing to me than original. But it has cutscenes and audio in its own format, which makes it a bit more interesting. A contributor, who wishes to remain anonymous, sent me his description of this format so I was able to implement demuxer and decoder for their video and it will be committed soon (I hope). Details of this format are already here.

Flowers

October 9th, 2007

As a follow-up to the theme set by Michael here are my few random shots of flowers (made in the Western Ukraine, several thousands kilometers from my home).

Read the rest of this entry »

Musepack SV8 is almost ready

September 26th, 2007

Judging from this post there is an eight steam version of Musepack in beta testing (but the spec is not frozen yet).

What distinguishes it from previous versions? Now it is container-aware. Previous versions store just audio frame in the continuous bitstream with no defined behavior on seeking nor demuxing. It was almost as fun as Monkey’s Audio container.

Now Musepack has a chance to spread in the wild (i.e. in other containers than .MPC). That mostly depends if there will be standard way to store them in the other containers (that’s where Ogg Vorbis failed). Well, good luck.

Some details on RV30/40

September 12th, 2007

RV8 or RV30 is close to earlier H.264 drafts and Sorenson Video 3, because they both use one variable-length code (Golomb codes in case of SVQ3, something special in RV30/RV40) to represent macroblock information (type, intra prediction modes). RV40 uses this code only to represent motion vectors and number of macroblocks to skip, macroblock type and intra prediction modes are coded with variable-length codes chosen from the set of them depending on context.

Here are the main differences from H.264:

  1. Different bitstream. Special codes for elements and own substitution of Golomb code.
  2. Intra prediction types are specified only for luma subblocks, chroma subblocks use some of them for prediction too.
  3. Intra prediction modes are slightly different and some of them require additional context (down left neighbour).
  4. Bidirectionally predicted blocks in B-frames do not use motion vectors from the previous/next frame in motion vector prediction while using them in motion compensation.
  5. Intra prediction for intra blocks in interframes is performed even when neighbouring blocks are not intra blocks.

Main differences between RV30 and RV40:

  1. Bitstream syntax and different codes (RV30 is easier in this matter).
  2. RV30 does not have 8×16 and 16×8 motion compensation modes (while 8×8 mode exists).
  3. Different motion vector prediction algorithm.
  4. B-frames in RV30 do not contain some variant of bidirectionally predicted blocks which RV40 has.

And now here is the list of things I didn’t like in RV30/40:

  1. Slice header does not contain number of macroblocks coded in this title. While it saves whopping 6-13 bits per slice (and frames usually have less than a dozen of slices, usually two or three), it gives unnecessary pain to implementor.
  2. Vertical left intra prediction method uses down left neighbour pixels in calculation for one insignificant pixel (it’s insignificant because it does not affect further intra prediction).
  3. Motion vector prediction is a bit complicated too.
  4. And, of course, the lack of good documentation

What I did during my summer

September 12th, 2007

It was very hot summer – both in terms of temperature and development process. I was writting decoder for the some codec (look at the category of this post if you don’t know what codec family was that, the fourth incarnation of it).

The main difficulty was
With the help of this tool written by Mike both he and I were able to capture some dynamic data during execution of the reference decoder and, basing on that information – to reconstruct execution flow, debug some variables and guess how some functions work. Still there were many functions that required assembly-human dictionary and took plenty of time.

What has surprised me during this work:

  • That is the first time I work with PIC code. Now I understand why some people believe it’s evil thing. Constantly calculating real addresses from offsets from GOT (global offset table) is no fun and switch(){} construction becomes extremely complicated. But now I know GOT address of that decoder by heart.
  • Passing parameters partly via stack and partly via registers (for example, source and stride are passed in register and destination as an argument) is also fun to watch
  • It’s quite unexciting when one function does virtually nothing except calling other function (with the same arguments).
  • There are several VLC reading techniques employed in reference decoder:
    1. Peeking if the next bits match some codeword in the table (i.e. iterate over table of codewords, look at the next k bits of bitstream and compare them with codeword)
    2. Using next 16 bits from the bitstream to find out code length (by finding prefix corresponding to the given length), then the difference between prefix and actual code (minus unused bits of course) is used to determine actual code value
    3. Using 256-element lookup table on the next 8 bits in bitstream to determine code length and value
  • And that’s not all – codeset for the first case may be stored in two ways – in pairs (code, length) and in one value using formula (1 << length) | code. And function using this has to determine most significant one position, remove it and then call show_bits().

I plan to give codec architecture review in my next post.

Samples, and lots of them

August 20th, 2007

Extremely big thanks to Mike, who sent me a dozen of DVDs with samples and they eventually got delivered (Roberto Togni also had sent me some disks with samples but they disappeared somewhere). It took about a week to deliver them to Ukraine. And a whole month to pass customs control.

But nevertheless it’s here and now I have enough samples to test my future and current decoders.