Some Information about VoxWare MetaSound

June 5th, 2013

So I’ve looked at the beast again. It seems to be close enough to the original TwinVQ (as in .VQF, not something that got into MPEG-4 Part 3 Subpart 4 Variation 3), so I’ll just try to document spotted differences.

Coding modes. Original TwinVQ had 9 modes, VoxWare has twice as much (and so twice as much codebooks!). One of them is very special (8kHz at 6kbps), with a cutoff of “high” frequencies. Also mode explicitly signals the number of channels so some modes are stereo-only and some are mono-only.

Bitstream format differences. Bitstream is packed LSB, the first byte is not skipped by the decoder. There’s an additional 2-bit variable right after window type present in many modes (but not 8kHz@6kbps or when short windows are used), my guess is that it has something to do with intensity stereo. Some parts order seems to be shuffled (i.e. original TwinVQ used p_coef, g_coef, shape order, MetaSound uses p_coef, shape, g_coef order).

Reconstruction. I’m not familiar with TwinVQ much myself but it looks like there are some small differences there as well. For instance, pgain base is 25000 for mono and 20000 for stereo and in bark decoding scales 0.5, 0.4, 0.35 are used instead of 0.4, 0.35, 0.28 (not really sure about that bit).

Any help with the decoder is welcome — new decoder will reuse most of the current TwinVQ decoder after all and new tables (it should take the title of decoder with the biggest tables from DCA decoder).

A New Month, Some New Goals

June 1st, 2013

As suggested by Anton, it’s the month of overengineered codecs.

The goals are the following (warning: they are subject to change without any notice)

  • work on REing VoxWare MetaSound (the thing aforementioned Anton should have done long time ago — it is only slightly different from stock TwinVQ decoder after all);
  • make proper ClearVideo decoder, currently it supports I-frames only in AVI and RM (samples in QuickTime are welcome BTW);
  • work on REing Discworld III video format;
  • On2 AVC decoder;
  • make Mike M. reverse engineer On2 VP4;
  • add raw mode for IMC/IAC;
  • work on Indeo 4 B-frames support (yeah, very likely);
  • push G2M4 (aka Go2WatchBoringSlideshows, do not confuse it with Go2BoringEnterpriseEvent codec) decoder.

Sheer Madness

May 22nd, 2013

(luckily there’s not much left of this month of intermediate codecs)

So I’ve looked at another intermediate codec, post title hints on both its name and design. Coding scheme is rather simple: you code lines either in raw form or with prediction (from the left neighbour for the top line or (3 * L + 3 * T - 2 * TL) >> 2 for other lines, prediction error is coded with fixed Huffman codes.

Simple, right?

Here’s the catch: there is an insane number of formats it supports, both for storage and output and there’s an insane number of decoding functions for decoding format X into format Y.

So quite probably no decoder — not interesting and too tedious.

ProRes alpha support is almost there

May 17th, 2013

I’ve finally brought myself into looking at alpha plane decoding support for ProRes. It was a bit peculiar but rather easy to reverse engineer. Now I only need to update my ConsumerRes decoder to support it.

And that’s probably enough for the month of intermediate codecs.

A Well-designed Intermediate Codec

May 12th, 2013

The adjective is referring to the hype that the company that made this codec is run by designers (unlike some other companies where even design is made by developers or — even worse — marketers). And let’s call it AWIC or iNtermediate codec for short. Let’s not mention its name at all.

It is a rather old codec and it codes 8-bit YUV420 in 16×16 macroblocks with DCT, quantisation and static codes. Frame is divided into slices in such way so that there are not more than 32 slices on one line (and slice height is one macroblock). The main peculiarity is having scalable mode — every macroblock is partitioned into 8×8 sub-macroblock (i.e. 8×8 luma block and two 4×4 chroma blocks) with the following data for the rest of the block, and this is exploited for decoding frames in half-width, half-height or half-width half-height modes.

Maybe I’ll write a decoder for it after all.

In ten years every codec becomes Op^H^HJPEG

May 11th, 2013

So, RAD has announced Bink 2. While there are no known samples or encoder, decoder is present in RAD game tools already. For some random reason (what I have to do with Bink anyway?) I decided to look at it.

Format is probably the same except that preferred extension is .bk2 and it starts with 'KB2f' instead of 'BIKf' or 'BIKi'.

The main features they advertise are speed and dual-core decoding support. Most parts of the code are SIMDified indeed and as for dual-core decoding support it seems to be fulfilled with breaking frame into top and bottom half (not that I’ve looked at it closely but strings in the player suggest that).

Now about the format itself. Bink2 operates in YUV 4:2:0 format with optional alpha and employs 8×8 DCT with 16×16 macroblocks. There are not many interesting details in the coding itself: DCs are coded separately before ACs, three quantisation matrices — two for luma/alpha (for intra and inter blocks) and one for chroma, static codes are used for coding them (compare that to the way it was done in Bink Classic), motion compensation is halfpel for luma and quarterpel for chroma now with bicubic interpolation. There are four modes for coding block: intra block, skip block, motion-only block and motion compensation with residue coded.

There seems to be some postprocessing they rightfully call “Blur” but I’m not that sure about it.

What can I say about the codec overall? It’s boring. While Bink 1 is not that fast it was much more fun to RE: coding values in bundles ­— I’ve rarely seen that (Duck TrueMotion 2 comes to mind and that’s all), various coding techniques — vector quantisation and DCT (as I’ve mentioned above, coding DCT coefficients was rather unique too) and some other tricks (unusual scans, specially coded block difference, double-scaling blocks, etc. etc.).

Overall, Bink2 will probably be what it’s promised to be (fast, portable codec for games) but it won’t have the real spirit of Smacker and Bink design. Or maybe it’s just me getting older.

P.S. I wonder if they start providing logo in Bink2 file embedded in player like they do with Smacker and Bink players.

P.P.S. This post title is inspired by a certain German saying about cars in case it wasn’t obvious.

Final Words on Canopus HQ, HQA and HQX

May 9th, 2013

Astrologers proclaim month of intermediate codecs. Number of blog posts about intermediate codecs doubles! (from zero)

Let’s look at Canopus codecs and their development.

Canopus Lossless

This is very simple lossless video codec: you have code tables description and coded difference from the left neighbour (or top one for the first pixel) for each component. For RGBA and YUV there are slight improvements but the overall coding remains the same.

Canopus HQ

This is an ordinary intermediate codec employing IDCT with 16×16 macroblocks in 4:2:2 format and interlacing support.
It has predefined profiles with frame sizes (from 160×120 to 1920×1080), number of slices and macroblock shuffling order (yes, like DV it decodes macroblocks in shuffled order).

Block coding is nothing special but quantising is. For every macroblock there can be one of sixteen quantiser sets selected. And for each block one of four quantising matrices can be selected from that set. Of course there are different quantiser matrices for luma and chroma (i.e. 128 quantising matrices in total, about 80 of them are unique).

Interlacing is signalled per block (in case it’s enabled for the frame).

Canopus HQA

This is Canopus HQ with alpha support. The main differences are flexible frame size (no hardcoded profiles), alpha component in macroblocks and coded block pattern. Coding and tables seem to be the same as in HQ.

Coded block pattern specifies which of 4 luma blocks are coded (along with corresponding alpha and chroma blocks). Uncoded blocks are filled with zeroes (i.e. totally transparent).

Canopus HQX

This codec combines both previous codecs and extends them for more formats support. While HQ was 4:2:2 8-bit, this one can be 4:2:2 or 4:4:4 and 9-, 10- or 11-bit support (with or without alpha).

There are changes in overall and block coding.

Frame is now partitioned into slices of 480 macroblocks and every 16 macroblocks are shuffled.

Blocks now have more adaptive coding. DCs are coded as the differences to the previous ones (inside macroblock component) and instead of being coded as 9-bit number they are now Huffman-coded and table is selected depending on component bit depth. Quantisation is split: now there’s a selectable quantiser and two quantiser matrices (for luma/alpha and chroma). AC codes are selected depending on quantiser selected for the block. So there are less quantiser matrices (two instead of seventy eight) but more VLC tables (CBP + 3 DC + 6 AC tables instead of CBP and single AC table).

Conclusion

Reverse engineering all those formats was obvious because they are not complex, obfuscated or C++ (which is usually both).

Shall I write decoders for them? Unlikely. The codecs are not too interesting (I’ve seen only one Canopus HQ sample and no HQA or HQX samples at all) and rather tedious to implement because of all those tables. And we have Canopus Lossless decoder already.

Another “reverse engineer codec before breakfast” report

May 8th, 2013

So, as it happens to me sometimes, I’ve looked at some random codec before fixing myself a breakfast.

What do we have here?

  • 8×8 DCT
  • lots of static VLCs (not as much as Real codecs have though)
  • several coding modes — 4:2:2, 4:4:4, both with alpha or not
  • block-level interlacing

Macroblock consists of 4 luma blocks, 4 or 8 chroma blocks and optionally 4 alpha blocks. In the latter case not all blocks might be coded and CBP is present.

There are separate VLC tables for DC depending on bit depth (10, 11 or 12 bits per component) and for quantised ACs (for quantisers 0-7, 8-15, 16-31, 32-63, 64-127 and the rest).

The only interesting scheme is quantiser selection: there is a table with predefined quantiser quadruplets. Every macroblock can select any quadruplet and every block can use any quantiser from it.

Oh, and I think codec’s name was Canopus HQX.

Some notes about AVC

April 13th, 2013

Of course you’ve heard about AVC, the famous audio codec from On2 (and not the codec used as a base for VP7 and VP8). Out of curiosity I tried to look at it and found out that it was as creative as video codecs.

So here’s how it looks:

  • It is 1024-point MDCT based codec.
  • It codes samples either in one window or in eight 128-sample windows. There are four modes because of that — long window, short windows and two transitional modes.
  • Windows are organised into bands, 49 bands for the long window, 12 bands for each short window (band widths are the same for all sampling frequencies though).
  • Frame data consists of: windowing mode, grouping information for short windows, some band flags, run-coded information about used codebooks, scales, coefficients.
  • Scales are stored this way: first scale is explicitly coded 7-bit value, other scales are coded as differences and restored as prev_scale + scale_diff - 60.
  • Codebook 0 means no coefficients codec, codebooks 1-8 code quartets of coefficients, codebooks 9-15 code pairs of coefficients, additionally codebook 15 allows escape values to be coded.
  • Unscaling is performed as *dst++ = val * sqrt(val) * scale.

Do not blame me if that reminds you of some other codec. Bitstream format is different after all. And the second letter of the codec name too.

OMG

March 23rd, 2013

IMG_1466

I think this should be in every first aid kit.