Archive for the ‘Speech Codecs’ Category

On Bluetooth codecs

Wednesday, December 15th, 2021

I got a strange request for LDAC decoder as it may help to “…verify (or debunk) Sony’s quality claims.” This made me want to write a post about the known BT codecs and what’s my opinion on them.

Bluetooth codecs can be divided into three categories: the standard ones defined in A2DP that nobody uses (MP3 and ATRAC), the standard ones that are widely used (AAC and SBC) and custom codecs supported by specific vendors.

So, let’s start with mandatory A2DP codecs:

  • SBC—a codec designed specifically for Bluetooth. It works like MPEG Audio Layer II but with 4 or 8 sub-bands and parametric bit allocation instead of fixed tables. This allows it to change bitrate at any frame (which allows it to adapt to changing transmission quality). I heard an opinion that it beats newer codecs at their bitrates in quality but the standard intentionally capped it to prevent that. I find that not that hard to believe;
  • MPEG-1,2 Audio—I’ve not heard that anybody actually uses them and it’s fro the best;
  • MPEG-2,4 AAC—it should give better quality than SBC but for a much larger delay and decoding complexity;
  • ATRAC family—this feels like a proprietary replacement of AAC to me. I’ve not heard that anybody actually supports any of the codecs in their products (it’s not that I’ve heard much about BT in general though).

Here I should also mention a candidate codec named LC3 (and LC3plus). Whatever audio codec FhG IIS develops, it’ll be AAC. LC3 is no exception as by the first glance it looks like like AAC LC with an arithmetic coding and some additional coding tools glued to it.

There’s CVSD codec for speech transmission over BT. It’s a speech codec and that’s enough about it.

Now let’s move to the proprietary codecs:

  • aptX—a rather simple codec with 4:1 compression ration (four 16-bit samples into single 16-bit word). The codec works by splitting audio into four sub-bands, applying ADPCM and quantising to the fixed amount. Beside inability to adapt to bad channels it should produce about the same quality as SBC (at least from a feature comparison point of view);
  • aptX HD—the same as non-HD version but works on 24-bit samples (and probably the only honest high-res/high-definition codec here);
  • aptX other variants—they exist but there’s no solid information about them;
  • LDAC—will be discussed below in more detail. For now suffice to say it’s on MP2 level and hi-res claims are just marketing;
  • LHDC and LLAC—not much is known about the codecs but after seeing quality comparison picture (with a note) on the official website I don’t expect anything good;
  • Ultra Audio Transmission—there’s no information about it except for a name mentioned in Wikipedia list of BT codecs and some marketing materials on the page with smartphone description by the same vendor;
  • Samsung BT codecs—see above.

Now let’s review LDAC specifically. I’m somewhat surprised nobody has written a decoder for it yet. It’s so easy to reconstruct the format from the open-source encoder that Paul B. Mahol could do it in a couple of days (before returning to Bink2 decoder hopefully). aptX has only binary encoder and yet people have managed to RE it. I’m not going to do it because I don’t care much about Bluetooth codecs in general and it’s not a good fit for NihAV either.

To the technical details. The codec frame is either one long MDCT or two interlaced half-size MDCTs (just like ATSC A/52B), coefficients are coded as pairs, quads or larger single values (which reminds me of MP3 and MP2, quantisation is very similar as well). Coefficients (in pairs and quads as well) are stored in bit fields, the only variable-length codebooks are used to code quantiser differences. There’s bit allocation information transmitted for each frame so different coefficients can have different bit sizes (and thus precision). Nevertheless the maximum it can have is just 15 bits per coefficient (plus sign), which makes it hardly any hi-resier than AAC LC or SBC. And the only excuse that can be said here is the one I heard about MP3 being hi-res: with the large scales and coefficients you can have almost infinite precision. Disproving it is left as an exercise to the reader.

I hope now it’s clear why I don’t look at the development of Bluetooth codecs much. Back to slacking.

Looking at Voxware MetaVoice

Monday, December 13th, 2021

Since there’s not much I’d like to do with NihAV, I decided to revisit one. old family of codecs.

It seems that they had several families of codecs and most (all?) of them are licensed from some other company, sometimes with some changes (there are four codecs licensed from Lernout & Hauspie, MetaSound is essentially TwinVQ with a different set of codebooks, RT2x and VR1x are essentially different flavours of the same codec, SC3 and SC6 might be related to Micronas codec though Micronas SC4 decoder does not look similar at all).

So here’s a short review of those various codecs that I have some information about:

  • L&H CELP 4.8kpbs—this is rather standard CELP codec with no remarkable features (and I’ve even managed to write a working decoder for it);
  • L&H SBC 8/12/16kbps—that one is a sub-band coder with variable frame size (and amount of bits allocated per band);
  • RT24/RT28/RT29HQ and VR12/VR18—all these codecs share the common core and essentially it’s a variable-bitrate LPC-based speech codec with four different frame modes with no information transmitted beside frame mode, pitch information and the filter coefficients (for CELP you’d also have pulse information).
  • SC3/SC6—this one seems to be more advanced and, by the look of it, it uses order 12 LPC filter (usually speech codecs use either LPC of order 10 or 16).

I’ll try to document it for The Wiki but don’t expect much. And I’m not going to implement decoders for these formats either (beside already implemented 4.8k CELP one): the codecs have variable bitrate so you need to decode a frame (at least partially) in order to tell how many bytes it will take—and I don’t want to introduce a hack in NihAV to support such mode (either the demuxer should serve variable-length frames or the decoder should expect fixed-size frames); and even worse thing is that they are speech codecs that I don’t understand well (and there’s a lot of obscure code there). It took me more than a week to implement and debug CELP decoder. Fun story: I could not use MPlayer2 binary loader because the codec was misdetected as MPEG Audio Layer II. The cause of that was libavformat and its “helpful” tag search: when twocc 0x0070 was not found, it tried upper-case 0x0050 which belongs to MP2. And after I’ve finally made it work I discovered a fun bug in the reference decoder: while calculating cosine, the difference can overflow and thus the resulting value is somewhat wrong (and it could be easily fixed by changing “less or equal” condition to “less” in table search refinement step).

Anyway, it’s done and now I can forget about it.

Some Information on Micronas SC4 and VoxWare MetaSound

Sunday, April 24th, 2016

So I’ve looked at them.

Micronas SC4 seems to be rather unusual as it seems to bring elements of LPC to ADPCM. So it’s not just the old conventional “get nibble, multiply by step, output prediction, update index and step values”—it keeps a history of last 6 decoded samples and predictions and use them to calculate a new prediction value. Details might appear in the Wiki one day.

VoxWare MetaSound is three families of 2-3 codecs bundled under the same brand. I’ve not looked at technical details but they seem to have lots and lots of tables with floating point numbers (or just a bit of tables if you’ve looked at MetaSound first).
Here are the codecs:

  • RT24 2400bps “Real-Time” codec (ID is VOXa)
  • RT28 2844bps “Real-Time” codec (ID is VOXh)
  • RT29 2978bps “High Quality” codec (ID is VOXg)
  • VR12 1260bps Variable Rate codec (ID is VOXb)
  • VR15 1537bps Variable Rate codec(ID is VOXc)
  • SC3 3200bps “Embedded” codec (no ID)
  • SC6 6400bps “Embedded” codec (no ID)

Ask for support by grabbing j-b and demanding it to be supported. I know there are other players beside VLC but that’s the only project advertising that it “plays it all” even on T-shirts. It’s time to be responsible for your own words. And ask for Bink2 too while at it.