Archive for the ‘Audio’ Category

Some Information about VoxWare MetaSound

Wednesday, June 5th, 2013

So I’ve looked at the beast again. It seems to be close enough to the original TwinVQ (as in .VQF, not something that got into MPEG-4 Part 3 Subpart 4 Variation 3), so I’ll just try to document spotted differences.

Coding modes. Original TwinVQ had 9 modes, VoxWare has twice as much (and so twice as much codebooks!). One of them is very special (8kHz at 6kbps), with a cutoff of “high” frequencies. Also mode explicitly signals the number of channels so some modes are stereo-only and some are mono-only.

Bitstream format differences. Bitstream is packed LSB, the first byte is not skipped by the decoder. There’s an additional 2-bit variable right after window type present in many modes (but not 8kHz@6kbps or when short windows are used), my guess is that it has something to do with intensity stereo. Some parts order seems to be shuffled (i.e. original TwinVQ used p_coef, g_coef, shape order, MetaSound uses p_coef, shape, g_coef order).

Reconstruction. I’m not familiar with TwinVQ much myself but it looks like there are some small differences there as well. For instance, pgain base is 25000 for mono and 20000 for stereo and in bark decoding scales 0.5, 0.4, 0.35 are used instead of 0.4, 0.35, 0.28 (not really sure about that bit).

Any help with the decoder is welcome — new decoder will reuse most of the current TwinVQ decoder after all and new tables (it should take the title of decoder with the biggest tables from DCA decoder).

Some notes about AVC

Saturday, April 13th, 2013

Of course you’ve heard about AVC, the famous audio codec from On2 (and not the codec used as a base for VP7 and VP8). Out of curiosity I tried to look at it and found out that it was as creative as video codecs.

So here’s how it looks:

  • It is 1024-point MDCT based codec.
  • It codes samples either in one window or in eight 128-sample windows. There are four modes because of that — long window, short windows and two transitional modes.
  • Windows are organised into bands, 49 bands for the long window, 12 bands for each short window (band widths are the same for all sampling frequencies though).
  • Frame data consists of: windowing mode, grouping information for short windows, some band flags, run-coded information about used codebooks, scales, coefficients.
  • Scales are stored this way: first scale is explicitly coded 7-bit value, other scales are coded as differences and restored as prev_scale + scale_diff - 60.
  • Codebook 0 means no coefficients codec, codebooks 1-8 code quartets of coefficients, codebooks 9-15 code pairs of coefficients, additionally codebook 15 allows escape values to be coded.
  • Unscaling is performed as *dst++ = val * sqrt(val) * scale.

Do not blame me if that reminds you of some other codec. Bitstream format is different after all. And the second letter of the codec name too.

More about Monkey’s Audio filter changes

Sunday, March 3rd, 2013

In the previous post I gave general overview of codec changes, now I’m going to look more deeply at the filter changes with time.

  • 3950 — current mode with up to three layers of IIR filters
  • 3930 — simpler filters: no third layer (there was no insane compression level back then) and the difference between predicted and actual value was not used.

For the older versions there are differences in the implementations of the filters for the different compression modes.

Fast compression:

  • 3200 — order 2 adaptive prediction (i.e. previously decoded and adjustable prediction value are used in prediction)
  • 0000 — almost the same but with different rules for adjustment factor updating

Normal compression:

  • 3800 — two layers of filters: order 4 adaptive prediction and order 2 afterwards
  • 3200 — the same structure, different rules for updating
  • 0000 — three layers with orders 3, 2 and 1 and different updating rules

High compression:

  • 3700 — first it tries first order adaptive prediction with the delay of 2-16 (i.e. the next to previous element is used for prediction) and normal mode decompression afterwards (different decoding for 3800 of course)
  • 3600 — the same but delays are 2-13
  • 3200 — the same but delays are 2-7
  • 0000 — orders 5 and 4 and different updating rules

Extra high compression:

  • 3830 — an IIR filter resembling the one used in the newer Monkey’s Audio versions
  • 3800 — some filter parameters were half as much as in 3830 and there was no delay 2-8 filtering
  • 3600 — delay filtering plus high filtering (which is delay filtering plus normal filtering, which can be expressed as a layer of filtering over fast filtering)
  • 0000 — essentially the same but with different prefiltering

Monkey’s Audio: noted differenced between versions

Thursday, February 28th, 2013

While preparing for working on old APE versions support I finally got courage to try and trace all changes for different versions. So here’s the list of internal versions and the changes they introduced:

  • 0000 — the reference version for all prehistoric version. Before version 0000 it was fine, then it all got worse IMO.
  • 3320 — changes in the filters
  • 3600 — changes in the filters
  • 3700 — changes in the filters
  • 3800 — blocks per frame changed for extra high compression level; changes in the filters (yawn)
  • 3810 — frame start at byte boundaries now
  • 3820 — special codes extension (signalled by top bit of CRC set to one)
  • 3830 — filter lengths and some implementation details changed
  • 3840 — CRC calculation algorithm changed a bit
  • 3870 — significant changes in residue coding
  • 3890 — small changes in residue coding
  • 3900 — residue coding format has changed seriously.
  • 3910 — small change in the residue coding (more than 16 bit values can be coded now)
  • 3930 — significantly changed format introduced (both filtering and coding scheme were changed)
  • 3950 — filter format changed a bit (and insane compression mode is added somewhere after that), blocks per frame is changed too
  • 3960 — some small and compatible change in the bitstream (consuming two last bytes or not)
  • 3980 — file format is changed a bit; filtering process has changed a bit too.
  • 3990 — the latest (known) format. Residue coding has changed.

Do you still wonder why I strongly dislike this format?

Preserving extinct formats

Wednesday, February 27th, 2013

By the request of one guy (he has provided samples as well) I shall work on supporting old Monkey’s Audio versions (before 3.95).

Why? Because the latest official version of Monkey’s Audio has dropped support for those files, because I wanted to support such files since really long time (just didn’t have a good opportunity to do that) and because I definitely need a distraction from Go2Insanity codec (I shan’t blog about it anymore).

Well, let’s see what the old versions of the worst (known) designed lossless audio codec have to offer me.

Teasing

Saturday, October 6th, 2012

In the recent month I was not very productive, so I’d like to talk about codecs that I’m not likely to finish soon (not that I’m going to finish any codec soon anyway).

GoToMeeting 2/3

G2M2 decoder output (the best I could get)

Here’s the best output I could get from G2M2 or G2M3 data by decoding JPEG part of the tiles. ELS part still needs some work since it’s boring — 10-neighbour prediction, differential pixel decoding and other wonders of binary coder.

Certain Intermediate Codec

I managed to reverse engineer some parts of it. First you have so-called fixed header, then you have strip sizes and then strip data with some header as well. The way it’s coded is also more or less clear. But some connecting details — like how those strips are divided (now it looks like 96×1 macroblocks or equally ridiculous).

Since it’s QuickTime it’s hard to say where are the entry points to the codec and what functions are invoked.
Also the only usable binary (with debug symbols) is PowerPC only. It’s nice platform but I still need to learn some of its peculiarities.

VoxWare MetaSound

It turns out that it is slightly simplified variant of TwinVQ. It does not have variable-length codes, all values are read as fixed bits (depending on sampling rate and bitrate of course). The only catch is that it’s hard to find where such description is retrieved or generated. And existing codebooks are somewhat different.

Some Notes on Un-RE’d Codecs

Saturday, June 23rd, 2012

If I haven’t REd a codec that doesn’t mean I haven’t looked at them at all.
So today I want to talk a bit about some un-REd codecs and what peculiarities they have.

Looks like that all interesting codecs can be divided into three groups: screen codecs, intermediate codecs and speech codecs.
Since I don’t understand the latter group I shan’t give details on it.

Screen codecs

We have lots of them and they can be divided into two categories: simple and monsters.
Simple codecs usually employ some standard data compression library (zlib, FastLZ, LZO or LZF) or Huffman coding with standard median prediction and interframe difference.
I.e. boring, let’s talk about monsters.

  • Windows Media Video 9 Screen (aka MSS2) — combines palettised regions coded like in its predecessor with WMV9-coded regions.
  • M$ Expressions Encoder Screen (aka Titanium Screen codec) — it uses variable-length codes and codes frames with one of two methods. One of them is DCT exactly the same as in M$ ATC Screen codec.
  • MSU Screen Lossless Codec — this one seems simply code R,G,B values with some arithmetic coder and lots of context modeling and prediction.
  • Go2Meeting codecs — a good demonstration of the fact that the best strategy against REing is employing shitty coding monkey.
    Version 4 of decoder was monolithic 8 MB .dll file, version 4 is 15 MB already, all in “fine” C++.
    There are two compression methods known.
    Version 2 employs some weird arithmetic coder substitution (suspiciously like ELS-coder by Wm.D. Withers).
    Version 3 employs libjpeg and zlib for coding image blocks somehow, frame data doesn’t look like it at all.

Intermediate codecs

Cineform — looks like they use Huffman coding and wavelets and it codes 10-bit video.

Fruit Intermediate Codec — looks a lot like its successor (ProRes) but with different bitstream format and fixed coding scheme instead of adaptive ones.

BitJazz SheerVideo — the main problem with it is that most of the codec code performs conversion between any of couple dozens of formats (8- and 10-bit YUV and RGB packed in any possible way). Actual decompression code gets lost somewhere.

Some Notes on Indeo Audio (samples needed BTW)

Friday, June 1st, 2012

I’ve been working on this codec for a while and somewhat got it working.

Good news — it employs the same algorithms as its predecessor, except that it has stereo mode.

Bad news — it feeds slightly different values to those algorithms. So some tables used in calculations and number of free bits in the block (for allocation) differ. I’ve almost got it and hacked version of our IMC decoder outputs almost perfect sound. My suspicions are that it modifies original IMC tables for stereo mode case (since it codes audio in mid-side stereo mode it makes sense).

The problem is that there’s only one sample with this codec and it’s extremely short. So if someone has more files with Indeo Audio please provide them to us.

Codebook Hell

Tuesday, March 27th, 2012

There’s one codec I’d like to have reverse-engineered and implemented as an opensource decoder (well, lots of other codecs as well but this one particularly). Its name is VoxWare MetaSound, that’s an old codec which was used as an alternative to MP3 in old days of DiVX 3 😉 and its clones.

It’s definitely based on TwinVQ and is probably closer to the variant that got into MPEG-4 Audio standard (I suspect that mostly to make that standard even more bloated than before). That figures from having such modes like 8kHz/6kbps which is not present in VQF but present in ISO 14496-3 draft.

This codec probably has more data tables than TwinVQ (in binary decoder the section with codebooks is more than 256kB large, in TwinVQ it’s about 200kB) and should set a new record if we ever get a decoder for it.

Decoding looks very simple in theory: decoder initialises codebooks for given samplerate and bitrate (it’s actually signaled in extradata: VOXq for 44.1kHz/32kbps, VOXk for 16kHz/16kbps, VOXz for 44.1kHz/48kbps), for every frame it reads window type and an array of some values and performs reconstruction.

So far I was able to identify only some codebook information. Bark tables seems to be identical, but shape and whatever codebooks seem to be different.

I’ve spent a couple of evenings finding out that information and I dare someone (especially you, Vitor!) to write a decoder for it. I don’t know a thing about TwinVQ except one fact and it’s stated in the title.

Call for Intel Codecs

Monday, March 19th, 2012

I’ve spent two weekends and finally REd and wrote decoder for Re* Audio Lossless Format. With news like these I can deliberately call it Intel Audio Lossless Format.

So, what codecs we’re lacking so far?

  • Intel Audio Coder — it’s quite similar to IMC (Music Coder) but not identical.
  • Intel Layered Video Codec — probably it’s just h.263 variant, the only thing I know is that RealVideo 2 decoder was based on it (it’s mentioned in doxygen for Helix SDK I saw once in Internet somewhere and this supports that theory indirectly).
  • ClearVideo — a licensed fractal-based codec. It’d be rather simple DCT-based codec if not for one catch: it uses domain search to generate codes that then are used for block unpacking (and in decoder too, it seems). Maybe these patents will help?
  • Intel NGV — we’ll deal with it when it’s ready 🙂

Feel free to send any useful information about them, preferably working decoders of course.

After that we can claim full support of Real and Intel codec family.