Archive for the ‘Audio’ Category

Some notes about AVC

Saturday, April 13th, 2013

Of course you’ve heard about AVC, the famous audio codec from On2 (and not the codec used as a base for VP7 and VP8). Out of curiosity I tried to look at it and found out that it was as creative as video codecs.

So here’s how it looks:

  • It is 1024-point MDCT based codec.
  • It codes samples either in one window or in eight 128-sample windows. There are four modes because of that — long window, short windows and two transitional modes.
  • Windows are organised into bands, 49 bands for the long window, 12 bands for each short window (band widths are the same for all sampling frequencies though).
  • Frame data consists of: windowing mode, grouping information for short windows, some band flags, run-coded information about used codebooks, scales, coefficients.
  • Scales are stored this way: first scale is explicitly coded 7-bit value, other scales are coded as differences and restored as prev_scale + scale_diff - 60.
  • Codebook 0 means no coefficients codec, codebooks 1-8 code quartets of coefficients, codebooks 9-15 code pairs of coefficients, additionally codebook 15 allows escape values to be coded.
  • Unscaling is performed as *dst++ = val * sqrt(val) * scale.

Do not blame me if that reminds you of some other codec. Bitstream format is different after all. And the second letter of the codec name too.

More about Monkey’s Audio filter changes

Sunday, March 3rd, 2013

In the previous post I gave general overview of codec changes, now I’m going to look more deeply at the filter changes with time.

  • 3950 — current mode with up to three layers of IIR filters
  • 3930 — simpler filters: no third layer (there was no insane compression level back then) and the difference between predicted and actual value was not used.

For the older versions there are differences in the implementations of the filters for the different compression modes.

Fast compression:

  • 3200 — order 2 adaptive prediction (i.e. previously decoded and adjustable prediction value are used in prediction)
  • 0000 — almost the same but with different rules for adjustment factor updating

Normal compression:

  • 3800 — two layers of filters: order 4 adaptive prediction and order 2 afterwards
  • 3200 — the same structure, different rules for updating
  • 0000 — three layers with orders 3, 2 and 1 and different updating rules

High compression:

  • 3700 — first it tries first order adaptive prediction with the delay of 2-16 (i.e. the next to previous element is used for prediction) and normal mode decompression afterwards (different decoding for 3800 of course)
  • 3600 — the same but delays are 2-13
  • 3200 — the same but delays are 2-7
  • 0000 — orders 5 and 4 and different updating rules

Extra high compression:

  • 3830 — an IIR filter resembling the one used in the newer Monkey’s Audio versions
  • 3800 — some filter parameters were half as much as in 3830 and there was no delay 2-8 filtering
  • 3600 — delay filtering plus high filtering (which is delay filtering plus normal filtering, which can be expressed as a layer of filtering over fast filtering)
  • 0000 — essentially the same but with different prefiltering

Monkey’s Audio: noted differenced between versions

Thursday, February 28th, 2013

While preparing for working on old APE versions support I finally got courage to try and trace all changes for different versions. So here’s the list of internal versions and the changes they introduced:

  • 0000 — the reference version for all prehistoric version. Before version 0000 it was fine, then it all got worse IMO.
  • 3320 — changes in the filters
  • 3600 — changes in the filters
  • 3700 — changes in the filters
  • 3800 — blocks per frame changed for extra high compression level; changes in the filters (yawn)
  • 3810 — frame start at byte boundaries now
  • 3820 — special codes extension (signalled by top bit of CRC set to one)
  • 3830 — filter lengths and some implementation details changed
  • 3840 — CRC calculation algorithm changed a bit
  • 3870 — significant changes in residue coding
  • 3890 — small changes in residue coding
  • 3900 — residue coding format has changed seriously.
  • 3910 — small change in the residue coding (more than 16 bit values can be coded now)
  • 3930 — significantly changed format introduced (both filtering and coding scheme were changed)
  • 3950 — filter format changed a bit (and insane compression mode is added somewhere after that), blocks per frame is changed too
  • 3960 — some small and compatible change in the bitstream (consuming two last bytes or not)
  • 3980 — file format is changed a bit; filtering process has changed a bit too.
  • 3990 — the latest (known) format. Residue coding has changed.

Do you still wonder why I strongly dislike this format?

Preserving extinct formats

Wednesday, February 27th, 2013

By the request of one guy (he has provided samples as well) I shall work on supporting old Monkey’s Audio versions (before 3.95).

Why? Because the latest official version of Monkey’s Audio has dropped support for those files, because I wanted to support such files since really long time (just didn’t have a good opportunity to do that) and because I definitely need a distraction from Go2Insanity codec (I shan’t blog about it anymore).

Well, let’s see what the old versions of the worst (known) designed lossless audio codec have to offer me.

Teasing

Saturday, October 6th, 2012

In the recent month I was not very productive, so I’d like to talk about codecs that I’m not likely to finish soon (not that I’m going to finish any codec soon anyway).

GoToMeeting 2/3

G2M2 decoder output (the best I could get)

Here’s the best output I could get from G2M2 or G2M3 data by decoding JPEG part of the tiles. ELS part still needs some work since it’s boring — 10-neighbour prediction, differential pixel decoding and other wonders of binary coder.

Certain Intermediate Codec

I managed to reverse engineer some parts of it. First you have so-called fixed header, then you have strip sizes and then strip data with some header as well. The way it’s coded is also more or less clear. But some connecting details — like how those strips are divided (now it looks like 96×1 macroblocks or equally ridiculous).

Since it’s QuickTime it’s hard to say where are the entry points to the codec and what functions are invoked.
Also the only usable binary (with debug symbols) is PowerPC only. It’s nice platform but I still need to learn some of its peculiarities.

VoxWare MetaSound

It turns out that it is slightly simplified variant of TwinVQ. It does not have variable-length codes, all values are read as fixed bits (depending on sampling rate and bitrate of course). The only catch is that it’s hard to find where such description is retrieved or generated. And existing codebooks are somewhat different.

Some Notes on Un-RE’d Codecs

Saturday, June 23rd, 2012

If I haven’t REd a codec that doesn’t mean I haven’t looked at them at all.
So today I want to talk a bit about some un-REd codecs and what peculiarities they have.

Looks like that all interesting codecs can be divided into three groups: screen codecs, intermediate codecs and speech codecs.
Since I don’t understand the latter group I shan’t give details on it.

Screen codecs

We have lots of them and they can be divided into two categories: simple and monsters.
Simple codecs usually employ some standard data compression library (zlib, FastLZ, LZO or LZF) or Huffman coding with standard median prediction and interframe difference.
I.e. boring, let’s talk about monsters.

  • Windows Media Video 9 Screen (aka MSS2) — combines palettised regions coded like in its predecessor with WMV9-coded regions.
  • M$ Expressions Encoder Screen (aka Titanium Screen codec) — it uses variable-length codes and codes frames with one of two methods. One of them is DCT exactly the same as in M$ ATC Screen codec.
  • MSU Screen Lossless Codec — this one seems simply code R,G,B values with some arithmetic coder and lots of context modeling and prediction.
  • Go2Meeting codecs — a good demonstration of the fact that the best strategy against REing is employing shitty coding monkey.
    Version 4 of decoder was monolithic 8 MB .dll file, version 4 is 15 MB already, all in “fine” C++.
    There are two compression methods known.
    Version 2 employs some weird arithmetic coder substitution (suspiciously like ELS-coder by Wm.D. Withers).
    Version 3 employs libjpeg and zlib for coding image blocks somehow, frame data doesn’t look like it at all.

Intermediate codecs

Cineform — looks like they use Huffman coding and wavelets and it codes 10-bit video.

Fruit Intermediate Codec — looks a lot like its successor (ProRes) but with different bitstream format and fixed coding scheme instead of adaptive ones.

BitJazz SheerVideo — the main problem with it is that most of the codec code performs conversion between any of couple dozens of formats (8- and 10-bit YUV and RGB packed in any possible way). Actual decompression code gets lost somewhere.

Some Notes on Indeo Audio (samples needed BTW)

Friday, June 1st, 2012

I’ve been working on this codec for a while and somewhat got it working.

Good news — it employs the same algorithms as its predecessor, except that it has stereo mode.

Bad news — it feeds slightly different values to those algorithms. So some tables used in calculations and number of free bits in the block (for allocation) differ. I’ve almost got it and hacked version of our IMC decoder outputs almost perfect sound. My suspicions are that it modifies original IMC tables for stereo mode case (since it codes audio in mid-side stereo mode it makes sense).

The problem is that there’s only one sample with this codec and it’s extremely short. So if someone has more files with Indeo Audio please provide them to us.

Codebook Hell

Tuesday, March 27th, 2012

There’s one codec I’d like to have reverse-engineered and implemented as an opensource decoder (well, lots of other codecs as well but this one particularly). Its name is VoxWare MetaSound, that’s an old codec which was used as an alternative to MP3 in old days of DiVX 3 😉 and its clones.

It’s definitely based on TwinVQ and is probably closer to the variant that got into MPEG-4 Audio standard (I suspect that mostly to make that standard even more bloated than before). That figures from having such modes like 8kHz/6kbps which is not present in VQF but present in ISO 14496-3 draft.

This codec probably has more data tables than TwinVQ (in binary decoder the section with codebooks is more than 256kB large, in TwinVQ it’s about 200kB) and should set a new record if we ever get a decoder for it.

Decoding looks very simple in theory: decoder initialises codebooks for given samplerate and bitrate (it’s actually signaled in extradata: VOXq for 44.1kHz/32kbps, VOXk for 16kHz/16kbps, VOXz for 44.1kHz/48kbps), for every frame it reads window type and an array of some values and performs reconstruction.

So far I was able to identify only some codebook information. Bark tables seems to be identical, but shape and whatever codebooks seem to be different.

I’ve spent a couple of evenings finding out that information and I dare someone (especially you, Vitor!) to write a decoder for it. I don’t know a thing about TwinVQ except one fact and it’s stated in the title.

Call for Intel Codecs

Monday, March 19th, 2012

I’ve spent two weekends and finally REd and wrote decoder for Re* Audio Lossless Format. With news like these I can deliberately call it Intel Audio Lossless Format.

So, what codecs we’re lacking so far?

  • Intel Audio Coder — it’s quite similar to IMC (Music Coder) but not identical.
  • Intel Layered Video Codec — probably it’s just h.263 variant, the only thing I know is that RealVideo 2 decoder was based on it (it’s mentioned in doxygen for Helix SDK I saw once in Internet somewhere and this supports that theory indirectly).
  • ClearVideo — a licensed fractal-based codec. It’d be rather simple DCT-based codec if not for one catch: it uses domain search to generate codes that then are used for block unpacking (and in decoder too, it seems). Maybe these patents will help?
  • Intel NGV — we’ll deal with it when it’s ready 🙂

Feel free to send any useful information about them, preferably working decoders of course.

After that we can claim full support of Real and Intel codec family.

Why Lossless Audio Codecs generally suck

Saturday, November 27th, 2010

Why there are so many lossless audio codecs? Mike, obviously, had his thoughts on that subject and I agree with my another friend who said: “it’s just too easy to create lossless audio codec, that’s why everybody creates his own”.

Well, theory is simple: you remove redundancy from samples by predicting their values and code the residue. Coding is usually done with Rice codes or some combination of Rice codes and an additional coder — for zero runs or for finer coding of Rice codes. Prediction may be done in two major ways: FIR filters (some fixed prediction filters or LPC) or IIR filters (personally I call those “CPU eaters” for certain property of codecs using it). And of course they always invent their own container (I think in most cases that’s because they are too stupid to implement even minimal support for some existing container or even to think how to fit it into one).

Let’s iterate through the list of better-known lossless audio codecs.

  1. ALAC (by Apple) — nothing remarkable, they just needed to fit something like FLAC into MOV so their players can handle it
  2. Bonk— one of the first lossless/lossy codecs, nobody cares about it anymore. Some FFmpeg developers had intent to enhance it but nothing substantial has been done. You can still find that “effort” as Sonic codec in libavcodec.
  3. DTS-HD MA — it may employ both FIR and IIR prediction and uses Rice codes but they totally screwed bitstream format. Not to mention there’s no openly available documentation for it.
  4. FLAC — the codec itself is good: it’s extremely fast and features good compression ratios. The only bad thing about it is that it’s too hard to seek properly in it since there’s no proper frame header and you can just hope that that combination of bits and CRC are not false positive.
  5. G.711.0 — have you ever heard about it? That’s its problem: nobody cares and nobody even tries to use it.
  6. MLP/Dolby True-HD — it seems to be rather simple and it exists solely because there was no standardised lossless audio codec for DVD.
  7. Monkey’s Audio — well, the only good thing about is that it does not seem to be actively developed anymore.
  8. MPEG-4 ALS — the same problem: it may be standardised but nobody cares about it.
  9. MPEG-4 SLS — even worse since you need bitexact AAC decoder to make it work.
  10. OggSquish — luckily, it’s buried for good but it also spawned one of the worst container formats possible which still lives. And looking at original source of it one should not wonder why.
  11. RealAudio Lossless Format — I always say it was named after its main developer Ralph Wiggum. This codec is very special — they had to modify RM container format specially for it. A quick look inside showed that they use more than 800 (yes, more than eighty hundred) Huffman tables, most of them with several hundreds of codes (about 400 in average). That reminds me of RealVideo 4 with its above-the-average number of tables for context-dependant coding.
  12. Shorten — one of the first lossless audio codecs. Hardly anyone remembers it nowadays.
  13. TAK — it was originally called YALAC (yet another lossless audio codec) for a reason. Since it’s closed-source and fortunately not widespread (though some idiots use it for CD rip releases), it just annoys me time from time but I don’t think someone will work on adding support for it in FFmpeg.
  14. TrueAudio (TTA) — I can say anything about it except it seems to be quite widespread and it works. Looks like they’re still alive and work on TTA2 but who cares?
  15. WavPack — that’s rather good codec with sane bitstream format too. Looks like its author invested some time in its design. Also he sent patches to implement some missing features in our decoder (thank you for that!).
  16. WMA Lossless — from what I know, it uses IIR filter based on least minimum squares method for finding its coefficients. It has two peculiarities: that filter is also used for inter-channel decorrelation and bitstream format follows WMA9 format, i.e. it has something like interframes and frame data starting at arbitrary point (hello, MP3!).

P.S. I still hope this post won’t encourage anybody to write yet another useless lossless audio decoder.