Archive for the ‘Audio’ Category

Some Information on Micronas SC4 and VoxWare MetaSound

Sunday, April 24th, 2016

So I’ve looked at them.

Micronas SC4 seems to be rather unusual as it seems to bring elements of LPC to ADPCM. So it’s not just the old conventional “get nibble, multiply by step, output prediction, update index and step values”—it keeps a history of last 6 decoded samples and predictions and use them to calculate a new prediction value. Details might appear in the Wiki one day.

VoxWare MetaSound is three families of 2-3 codecs bundled under the same brand. I’ve not looked at technical details but they seem to have lots and lots of tables with floating point numbers (or just a bit of tables if you’ve looked at MetaSound first).
Here are the codecs:

  • RT24 2400bps “Real-Time” codec (ID is VOXa)
  • RT28 2844bps “Real-Time” codec (ID is VOXh)
  • RT29 2978bps “High Quality” codec (ID is VOXg)
  • VR12 1260bps Variable Rate codec (ID is VOXb)
  • VR15 1537bps Variable Rate codec(ID is VOXc)
  • SC3 3200bps “Embedded” codec (no ID)
  • SC6 6400bps “Embedded” codec (no ID)

Ask for support by grabbing j-b and demanding it to be supported. I know there are other players beside VLC but that’s the only project advertising that it “plays it all” even on T-shirts. It’s time to be responsible for your own words. And ask for Bink2 too while at it.


Saturday, March 26th, 2016

You know, the greatest reverse engineer I know is Derek B. He’s managed to RE such codecs as Canopus HQX and Cineform HD in the most efficient manner ever—saying he’ll do it and patiently waiting until somebody else does it.

So here are some words about his favourite lossless audio codec. The most interesting thing about it is that it was actively developed in 2001-2006 and then it was suddenly resurrected in 2015. Also it’s one of few non-standard codecs (i.e. not made into standard) that has several articles written about it.

The codec actually consists of two different formats, seemingly an old one and a newer one (that looks like it supports all range of sample type). The former is notable for having signal reconstruction stage using floating point math (a thing you don’t see in codecs every day), the latter seems to employ various parameter reading and reconstruction methods. Coding is done using low precision range coder (large values are decoded using chunks of 8 or 12 bits). So nothing really interesting there.

P.S. I’m definitely not going to write a decoder for it. There are too many lossless audio codecs already, let all proprietary ones (in custom containers too) die in peace.

A Call for Modern Audio Codec

Wednesday, February 11th, 2015

We need a proper audio codec to accompany state of the art video codecs, so here’s an outline of codec features that should be present:

  • audio codec should make more of its context, it should have a system of forward and backward reference frames like B-pyramid in H.264 or H.265;
  • it should employ tonal compensation with that — track the frequency changes from the references (e.g. it may be the same note continued or changing pitch);
  • time domain prediction via FIR or IIR filters;
  • flexible subdivision into subframes like binary tree;
  • raw (or non-transformed at least) coding mode for transients or noise;
  • integer only bitexact transform that passes for MDCT under bad light;
  • high-bitdepth sound support (up to 64 bits per sample).

The project name is transGhost (hopefully no Monty will be hurt by this).

And if you point out this is stupid — well, audio codecs should have the same rights as video codecs including PTS/DTS differences and employing similar coding methods.


Sunday, November 23rd, 2014

So, finally there’s a post about some codec.

It is a specialised codec from Oxford Germanium Television (all names are changed just in case) that has 4:1 compression ratio and very niche use. It’s hard to find even a decoder for it so this analysis was done on ARM version of encoder (maybe I’ll be able to RE something more useful next time like VX).

The codec itself is rather simple: you take 4 samples from one channel, compress them, output the 16-bit result and repeat the same for the second channel. Encoding is rather simple too:

  1. feed input to 4-band QMF (with filter looking a lot like D4 wavelet to me);
  2. perform ADPCM on each band (this varies a bit for each band but it’s the same approach);
  3. generate output word (7 bits for band 0, 4 bits for band 1, 2 bits for band 2 and 3 plus a parity bit for them all).

Since I have no samples of it don’t expect a decoder from me any time soon (and I don’t have enough motivation to hook Android encoder directly to make it produce data). Not that anyone cares about it either.

On Some Annoying Audio Codecs Family

Tuesday, July 29th, 2014

For the reasons I can’t disclose I really hate DTS codecs. For those who don’t know there are about three and a half codecs in this family:

  • DTS Core
  • DTS Core extensions (bitrate extension, two extensions for more channels and an extension for upsampling e.g. 48 kHz -> 96 kHz)
  • DTS Lossless (which might depend on core and extend/replace its channels)
  • DTS LBR (aka Express profile)

You need to be Jean-Baptiste Kempf to love these formats: DTS Core uses annoyingly large tables, DTS Lossless relies on DTS Core part being decoded properly for it, DTS LBR is a special beast that I’ll describe below. And the best part — all those formats are poorly documented (tables are missing for DTS Core, something was missing for DTS Core X96k extension too, bitexact core reconstruction and some other things needed for real lossless decoder implementation are not documented, LBR is not much better either).

So, what makes DTS LBR special? Its coding mode of course. This is a weird codec that employs MDCT (nothing special so far), codes tones separately (that’s not so common) and spreads it all among many chunks for different resolutions that make it “scalable” or whatever.

Nevertheless this post is not about how horrible are all those codecs (if you have ever worked with them it’s obvious and Jean-Baptiste Kempf won’t believe anyway), it’s about obscure relations with other codecs.

When I looked at QDesign Music codec (unsupported by Libav currently) I found that it has suspiciously familiar coding scheme for tones (QDesign Music 1/2 also use tone detection in MDCT frames) — I’ve seen it in DTS LBR. And indeed, it seems the same guy created some codec called LBpack that was first to use that approach, then he was employed by QDesign and then by DTS. No wonder it looked similar.

Another piece of trivia — there was one guy working on so-called adaptive prediction and transform scheme. Later the prototype known as APT100 was turned into DTS Core. But looks like the same work gave birth to lesser-known codec APT-X (that I’m currently REing but that’s beside the point). And it’s not just the name — one codec employs QMF and ADPCM on subbands, another one employs QMF and optional ADPCM on subbands.

All that makes one wonder whether DTS Lossless is related to some lossless codec outside DTS (not necessarily APT Lossless but might be, no details are known about that one). Currently I cannot name any other lossless codec that employs the same coding approach (block coding with different coding for large and small coefficients plus non-adaptive filter). Of course such knowledge won’t change anything but it would be still interesting to know.

P.S. There are rumours that DTS LBR will be made scalable for adaptive streaming, what a fun that will be!
P.P.S. This post was written mainly to test how well new Mike’s setup works.

Voxware Codecs and Tags

Saturday, August 10th, 2013

If you look at the registry of WAV formats you can see this:

0x0070 WAVE_FORMAT_VOXWARE_AC8 Voxware, Inc.
0x0071 WAVE_FORMAT_VOXWARE_AC10 Voxware, Inc.
0x0072 WAVE_FORMAT_VOXWARE_AC16 Voxware, Inc.
0x0073 WAVE_FORMAT_VOXWARE_AC20 Voxware, Inc.
0x0074 WAVE_FORMAT_VOXWARE_RT24 Voxware, Inc.
0x0075 WAVE_FORMAT_VOXWARE_RT29 Voxware, Inc.
0x0076 WAVE_FORMAT_VOXWARE_RT29HW Voxware, Inc.
0x0077 WAVE_FORMAT_VOXWARE_VR12 Voxware, Inc.
0x0078 WAVE_FORMAT_VOXWARE_VR18 Voxware, Inc.
0x0079 WAVE_FORMAT_VOXWARE_TQ40 Voxware, Inc.
0x007A WAVE_FORMAT_VOXWARE_SC3 Voxware, Inc.
0x007B WAVE_FORMAT_VOXWARE_SC3 Voxware, Inc.
0x0081 WAVE_FORMAT_VOXWARE_TQ60 Voxware, Inc.

In reality there’s one codec with several variations (MetaSound) and a family of low-bitrate MetaVoice codecs. And it doesn’t really matter what ID you’ll use — codec extradata contains real tag used to distinguish one codec from another. That’s why we can have 0x0075 format reserved for Voxware RT29 speech codec but used by MetaSound instead.

Here’s the list of internal tags:

  • VOXa — MetaVoice RT24, 8 kHz, mono, 2.4kbps
  • VOXb — MetaVoice VR12, 8 kHz, mono, 1.2kbps (variable bitrate)
  • VOXc — MetaVoice VR15, 8 kHz, mono, 2.4kbps (variable bitrate)
  • VOXg — MetaVoice RT29HQ, 8 kHz, mono, 2.98kbps (called high-quality for some reason)
  • VOXh — MetaVoice RT28, 8 kHz, mono, 2.8kbps
  • VOXi — MetaSound AC08, 8 kHz, mono, 8kbps
  • VOXj — MetaSound AC10, 11 kHz, mono, 10kbps
  • VOXk — MetaSound AC16, 16 kHz, mono, 16kbps
  • VOXL — MetaSound AC24, 22 kHz, mono, 24kbps
  • VOXq-VOXz — MetaSound mono and stereo, various formats
  • VX01 — MetaVoice SC3, 8 kHz, mono, 3.2kbps (embedded)
  • VX02 — MetaVoice SC6, 8 kHz, mono, 6.4kbps (embedded)
  • VX03 — MetaSound, 8 kHz, mono, 6kbps
  • VX04 — MetaSound, 8 kHz, stereo, 12kbps

So, maybe RT29 does not exist and it should be RT28 instead; obviously RT29HW is a typo for RT29HQ and the second SC3 should be SC6 in the registry (and unfortunately there’s no information about TQ40/TQ60). But who is going to correct WAVE formats list because of facts?

P.S. It would be nice to receive samples for all MetaSound modes (encoder is still available and should work on older Windows systems).

A Quest Continues

Friday, June 28th, 2013

Well, after some distraction as writing semi-working On2 AVC decoder (it turned out that On2 has introduced some special modes there that differ only on signal reconstruction stage, too lazy to RE them) and recovering after heat wave I’ve returned to the VoxWare ElenrilSound decoder.

I hate parametric codecs — no matter how you screw calculations you’ll still get some output but it won’t be useful for debugging. At least I can use MPlayer2 + binary codec loader + gdb combination to extract runtime information from the reference decoder.

Now I’m trying to make at least one mode work properly, 16kHz@16kbps mono (aka VOXk) for now. Stereo reconstruction might be trickier so I’ll leave it for later but at least most modes differ only by the tables they use. So (in theory) I’ll need to make at least this mode work, add tables for other modes, fix stereo decoding, look at 8kHz@6kbps mode, curse and forget about it.

Good news — bit allocation works properly and bits are read exactly as in the reference decoder. Bad news — reconstructed output is not even close to the expected one, so the work continues…

Some Information about VoxWare MetaSound

Wednesday, June 5th, 2013

So I’ve looked at the beast again. It seems to be close enough to the original TwinVQ (as in .VQF, not something that got into MPEG-4 Part 3 Subpart 4 Variation 3), so I’ll just try to document spotted differences.

Coding modes. Original TwinVQ had 9 modes, VoxWare has twice as much (and so twice as much codebooks!). One of them is very special (8kHz at 6kbps), with a cutoff of “high” frequencies. Also mode explicitly signals the number of channels so some modes are stereo-only and some are mono-only.

Bitstream format differences. Bitstream is packed LSB, the first byte is not skipped by the decoder. There’s an additional 2-bit variable right after window type present in many modes (but not 8kHz@6kbps or when short windows are used), my guess is that it has something to do with intensity stereo. Some parts order seems to be shuffled (i.e. original TwinVQ used p_coef, g_coef, shape order, MetaSound uses p_coef, shape, g_coef order).

Reconstruction. I’m not familiar with TwinVQ much myself but it looks like there are some small differences there as well. For instance, pgain base is 25000 for mono and 20000 for stereo and in bark decoding scales 0.5, 0.4, 0.35 are used instead of 0.4, 0.35, 0.28 (not really sure about that bit).

Any help with the decoder is welcome — new decoder will reuse most of the current TwinVQ decoder after all and new tables (it should take the title of decoder with the biggest tables from DCA decoder).

Some notes about AVC

Saturday, April 13th, 2013

Of course you’ve heard about AVC, the famous audio codec from On2 (and not the codec used as a base for VP7 and VP8). Out of curiosity I tried to look at it and found out that it was as creative as video codecs.

So here’s how it looks:

  • It is 1024-point MDCT based codec.
  • It codes samples either in one window or in eight 128-sample windows. There are four modes because of that — long window, short windows and two transitional modes.
  • Windows are organised into bands, 49 bands for the long window, 12 bands for each short window (band widths are the same for all sampling frequencies though).
  • Frame data consists of: windowing mode, grouping information for short windows, some band flags, run-coded information about used codebooks, scales, coefficients.
  • Scales are stored this way: first scale is explicitly coded 7-bit value, other scales are coded as differences and restored as prev_scale + scale_diff - 60.
  • Codebook 0 means no coefficients codec, codebooks 1-8 code quartets of coefficients, codebooks 9-15 code pairs of coefficients, additionally codebook 15 allows escape values to be coded.
  • Unscaling is performed as *dst++ = val * sqrt(val) * scale.

Do not blame me if that reminds you of some other codec. Bitstream format is different after all. And the second letter of the codec name too.

More about Monkey’s Audio filter changes

Sunday, March 3rd, 2013

In the previous post I gave general overview of codec changes, now I’m going to look more deeply at the filter changes with time.

  • 3950 — current mode with up to three layers of IIR filters
  • 3930 — simpler filters: no third layer (there was no insane compression level back then) and the difference between predicted and actual value was not used.

For the older versions there are differences in the implementations of the filters for the different compression modes.

Fast compression:

  • 3200 — order 2 adaptive prediction (i.e. previously decoded and adjustable prediction value are used in prediction)
  • 0000 — almost the same but with different rules for adjustment factor updating

Normal compression:

  • 3800 — two layers of filters: order 4 adaptive prediction and order 2 afterwards
  • 3200 — the same structure, different rules for updating
  • 0000 — three layers with orders 3, 2 and 1 and different updating rules

High compression:

  • 3700 — first it tries first order adaptive prediction with the delay of 2-16 (i.e. the next to previous element is used for prediction) and normal mode decompression afterwards (different decoding for 3800 of course)
  • 3600 — the same but delays are 2-13
  • 3200 — the same but delays are 2-7
  • 0000 — orders 5 and 4 and different updating rules

Extra high compression:

  • 3830 — an IIR filter resembling the one used in the newer Monkey’s Audio versions
  • 3800 — some filter parameters were half as much as in 3830 and there was no delay 2-8 filtering
  • 3600 — delay filtering plus high filtering (which is delay filtering plus normal filtering, which can be expressed as a layer of filtering over fast filtering)
  • 0000 — essentially the same but with different prefiltering