Variety of lossless audio codecs

September 23rd, 2006

There are currently 14 lossless audio codecs mentioned on MultiMedia Wiki page (look here for further links):

  • Proprietary (Apple Lossless, Meridian Lossless Packing, Real Lossless, WMA Lossless)
  • Closed source (LA, LPAC, LTAC, OptimFROG, RK Audio)
  • Open source (Bonk, FLAC, MPEG-4 ALS, Monkey’s Audio, Shorten, TrueAudio, WavPack)

FFmpeg currently has decoders for Bonk, FLAC, Shorten, TrueAudio and Apple Lossless. So, there are at least MPEG-4 ALS, Monkey’s Audio and WavPack decoders can be added.

I will work on WavPack decoder and ALS (I hope standard will appear soon). What about Monkey’s Audio? Yes, it’s popular but it has following difficulties for implementation:

  1. It has incredibly largeĀ  frame sizes (it may be more than one million samples) while competitors stick around 64k or less (hence the compression gain for MA). Current FFmpeg design cannot handle such frames.
  2. Source code is a mess – for almost every action there are at least several if(ver >= …) or if(ver< ...). Format is too unstable for me.

Well, I still hope it will be implemented some day.

VC-1 Simple/Main Profile: 93,8%

September 21st, 2006

OK, Now B-frames for simple and main profile work. Now there are only some exotic features left to implement as spatial downsampling and sync markers. And a lot of RE’ing to implement old WMV3 P-frames decoding and complex profile.

As for advanced profile, B-frames are still not implemented.

Codec Completeness: Report

September 7th, 2006

Today I’ve committed (I hope) the last patches to VMware Video decoder (videos looked rather funny without mouse pointer).

So, here are the stats for decoders:

  • VMware Video: 98% (some blocks are unknown but don’t interfere with decoding process)
  • VC-1 Simple/Main Profile: 75% (downscaled frames won’t be supported and B/BI frames are in progress)
  • VC-1 Advanced Profile: 40% (interlaced mode is not supported and BDU coding is not parsed – samples, anyone?)

A Work for the Weekend

September 5th, 2006

I got free Monday so I decided to spend some time on any small codec and took VMware codec used by VMware emulator to capture screen output. As Alex discovered, this is merely recorded session of RFB protocol (used in VNC applications) but slightly mangled.
My further investigation show that bytes 1-2 in big-endian order contain number of rectangles updated and if that value is equal to 8 then the first chunk is ServerInitialization struct with marker ‘WMVi’ inserted before pixel format data. Otherwise frame may contain one or several FramebufferUpdate messages with standard HexTile encoding.
I’ve committed decoder to FFmpeg and I hope to find more samples and clarify some details about it (there are seven markers from ‘WMVd’ to ‘WMVh’ but what they mark is known only for ‘WMVi’, palette mode is not tested too).
OOPS, I’ve just discovered three more samples at MPHQ and they are not decoded correctly, so I have some more work to do.

Some more experiments with VC-1

August 31st, 2006

For those who interested, here is a very experimental patch to enable decoding of WVC1 codec in FFmpeg. I’m currently searching and downloading sample files but it is already tested a bit.

Instructions:

  1. Get FFmpeg sources (r614x should work)
  2. Download and apply patch (cd ffmpeg; patch -p0 < vc1-advanced.patch)
  3. Run configure and make

As always, non-working samples are welcome.

FAQ

August 29th, 2006

I have got some letters and here I will try to summarize them.

You wrote some decoder, where can I find it?

When I reverse-engineer some codec I add decoder to FFmpeg (and soon it goes to other multimedia players) and description to Multimedia Wiki

Do you know how WMV3 video is stored in files?

I just have general ideas how it is stored. See Multimedia Wiki ASF page for details

Help! I have WMV3 sample that does not play {correctly, at all}.

Please follow instructions here. That’s the easiest way to get help (and other people than me may be interested in your samples).

M$ Way: Incompatible with itself

July 25th, 2006

If you look in VC-1 standard specification it’s hard to find header specification for Simple/Main Profile (at least I was unable to do so in comittee draft available – only some hints in bitstream container format). Was that intensional or not?

FFmpeg old and not working vc9.c contained header parsing code and reference decoder does this too. This header contains a lot of ‘reserved’ flags which hidden meaning become clearer if you meet with them:

  • RES_RTM_FLAG – should be set to ‘1’ but all old WMV3 files have it set to zero. For now you can call it ‘P-frames will be decoded correctly’ flag as old WMV3 has I-frames decoding identical to standard
  • PROFILE=2. This is not allowed profile but in vc9.c it was called “Complex Profile” and really is Advanced Profile for old WMV3. I don’t know why but one sample was decoded when I changed profile value to “Main”, other Complex Profile samples produced garbage (but they also had RES_X8 and RES_FASTTX flags set).

There is also new M$ codec – WVC1 which is not decoded by reference decoder.

So here is a small list of huge tasks:

  1. Find out how to decode P-frames when RES_RTM_FLAG=0 thus enabling decoding of many old WMV3 videos
  2. Find out how to parse WVC1 movies
  3. Support Complex Profile (not really important)

VC-1 Forward transform

July 24th, 2006

Some notes on forward transform.

For 4×4 transform matrix is:

17 17 17 17
22 10 -10 -22
17 - 17 - 17 17
10 -22 22 -10

And suggested norm is (8/289 ; 8/292; 8/289; 8/292)
The forward transform for (A B C D ) will be:

A1 = (17 * A + 17 * B + 17 * C + 17 * D) * 8 / 289;
B1 = (22* A + 10 * B - 10 * C - 22 * D) * 8 / 292;
C1 = (17 * A - 17 * B - 17 * C + 17 * D) * 8 / 289;
D1 = (10 * A - 22 * B + 22 * C - 10 * D) * 8 / 292;

Of course, don’t forget to round result.
I tested it on Octave and it proved to be correct.

Optimization is also straightforward: compute (A+D) , (B+C), (B-C) and use their sums/differencies.

B-frames: I feel cheated

July 24th, 2006

Yesterday I finished B-frames parsing support and to my surprise all of B-frames in test streams were decoded correctly even without motion compensation. I don’t know why all of them are composed from intra-blocks only. And they are not MS-invented BI-frames either.
While I understand that’s legal I still cannot understand why it was done so.

VC-1 Test Bitstreams

July 21st, 2006

I’ve written a small and quick demuxer for VC-1 Simple/Main profile test bitstreams (source available Here).

Most of the samples are recordings of Football training. You can see “Windows Media” logotypes on some players’ uniform.

And rarely clips with some singer occur.

Anybody can guess why?