Archive for the ‘RealVideo’ Category

RV30/40 – status

Tuesday, March 4th, 2008

Just for curious people who really want to know what’s happening in rv30/40 decoder for libavcodec implementation.

I have implemented all main parts of decoder including loop filters, but some of the finer details are missing like parameters that should be passed to loop filters or motion vector prediction. This results into jerky picture in case of B-frames present (and they are often present) and dirty tails after moving objects. See for yourself.

Screenshot of decoder performance

Some example of my rv40 decoder work

Currently the work on this decoder is stalled. In order to fix bugs I have to verify decoded data against reference decoder and that’s not easy. It takes a whole night to get the needed debug data for 70 frames from 320×240 video on my ThinkPad 390. And it takes a lot of space too considering I have about a hundred megabytes of free disk space there.

I want to obtain a small (I don’t have enough space to fit standard desktop), low-power (less than 20-30 Wt power consumption, power blackouts are quite common here) x86-based computer. I know they exist in many variations, but it’s next to impossible to buy one here.

Well, I will finish both encoders. Eventually. Especially if I have enough content to test it with – most files I’ve met (including samples.mplayerhq.hu) are either Japanese TV recordings (anime often bears Chinese subtitles) or Simpsons with crappy translation into Russian (for example, “you rock” was translated as it if was “you are a rock”). Oh, there are also some movie trailers but I fear the need to watch them won’t motivate even Mike.

If you are curious why I chose that shot. I believe it features the character main MPlayer server is named after.

Some details on RV30/40

Wednesday, September 12th, 2007

RV8 or RV30 is close to earlier H.264 drafts and Sorenson Video 3, because they both use one variable-length code (Golomb codes in case of SVQ3, something special in RV30/RV40) to represent macroblock information (type, intra prediction modes). RV40 uses this code only to represent motion vectors and number of macroblocks to skip, macroblock type and intra prediction modes are coded with variable-length codes chosen from the set of them depending on context.

Here are the main differences from H.264:

  1. Different bitstream. Special codes for elements and own substitution of Golomb code.
  2. Intra prediction types are specified only for luma subblocks, chroma subblocks use some of them for prediction too.
  3. Intra prediction modes are slightly different and some of them require additional context (down left neighbour).
  4. Bidirectionally predicted blocks in B-frames do not use motion vectors from the previous/next frame in motion vector prediction while using them in motion compensation.
  5. Intra prediction for intra blocks in interframes is performed even when neighbouring blocks are not intra blocks.

Main differences between RV30 and RV40:

  1. Bitstream syntax and different codes (RV30 is easier in this matter).
  2. RV30 does not have 8×16 and 16×8 motion compensation modes (while 8×8 mode exists).
  3. Different motion vector prediction algorithm.
  4. B-frames in RV30 do not contain some variant of bidirectionally predicted blocks which RV40 has.

And now here is the list of things I didn’t like in RV30/40:

  1. Slice header does not contain number of macroblocks coded in this title. While it saves whopping 6-13 bits per slice (and frames usually have less than a dozen of slices, usually two or three), it gives unnecessary pain to implementor.
  2. Vertical left intra prediction method uses down left neighbour pixels in calculation for one insignificant pixel (it’s insignificant because it does not affect further intra prediction).
  3. Motion vector prediction is a bit complicated too.
  4. And, of course, the lack of good documentation

What I did during my summer

Wednesday, September 12th, 2007

It was very hot summer – both in terms of temperature and development process. I was writting decoder for the some codec (look at the category of this post if you don’t know what codec family was that, the fourth incarnation of it).

The main difficulty was
With the help of this tool written by Mike both he and I were able to capture some dynamic data during execution of the reference decoder and, basing on that information – to reconstruct execution flow, debug some variables and guess how some functions work. Still there were many functions that required assembly-human dictionary and took plenty of time.

What has surprised me during this work:

  • That is the first time I work with PIC code. Now I understand why some people believe it’s evil thing. Constantly calculating real addresses from offsets from GOT (global offset table) is no fun and switch(){} construction becomes extremely complicated. But now I know GOT address of that decoder by heart.
  • Passing parameters partly via stack and partly via registers (for example, source and stride are passed in register and destination as an argument) is also fun to watch
  • It’s quite unexciting when one function does virtually nothing except calling other function (with the same arguments).
  • There are several VLC reading techniques employed in reference decoder:
    1. Peeking if the next bits match some codeword in the table (i.e. iterate over table of codewords, look at the next k bits of bitstream and compare them with codeword)
    2. Using next 16 bits from the bitstream to find out code length (by finding prefix corresponding to the given length), then the difference between prefix and actual code (minus unused bits of course) is used to determine actual code value
    3. Using 256-element lookup table on the next 8 bits in bitstream to determine code length and value
  • And that’s not all – codeset for the first case may be stored in two ways – in pairs (code, length) and in one value using formula (1 << length) | code. And function using this has to determine most significant one position, remove it and then call show_bits().

I plan to give codec architecture review in my next post.