Looking at Voxware MetaVoice

Since there’s not much I’d like to do with NihAV, I decided to revisit one. old family of codecs.

It seems that they had several families of codecs and most (all?) of them are licensed from some other company, sometimes with some changes (there are four codecs licensed from Lernout & Hauspie, MetaSound is essentially TwinVQ with a different set of codebooks, RT2x and VR1x are essentially different flavours of the same codec, SC3 and SC6 might be related to Micronas codec though Micronas SC4 decoder does not look similar at all).

So here’s a short review of those various codecs that I have some information about:

  • L&H CELP 4.8kpbs—this is rather standard CELP codec with no remarkable features (and I’ve even managed to write a working decoder for it);
  • L&H SBC 8/12/16kbps—that one is a sub-band coder with variable frame size (and amount of bits allocated per band);
  • RT24/RT28/RT29HQ and VR12/VR18—all these codecs share the common core and essentially it’s a variable-bitrate LPC-based speech codec with four different frame modes with no information transmitted beside frame mode, pitch information and the filter coefficients (for CELP you’d also have pulse information).
  • SC3/SC6—this one seems to be more advanced and, by the look of it, it uses order 12 LPC filter (usually speech codecs use either LPC of order 10 or 16).

I’ll try to document it for The Wiki but don’t expect much. And I’m not going to implement decoders for these formats either (beside already implemented 4.8k CELP one): the codecs have variable bitrate so you need to decode a frame (at least partially) in order to tell how many bytes it will take—and I don’t want to introduce a hack in NihAV to support such mode (either the demuxer should serve variable-length frames or the decoder should expect fixed-size frames); and even worse thing is that they are speech codecs that I don’t understand well (and there’s a lot of obscure code there). It took me more than a week to implement and debug CELP decoder. Fun story: I could not use MPlayer2 binary loader because the codec was misdetected as MPEG Audio Layer II. The cause of that was libavformat and its “helpful” tag search: when twocc 0x0070 was not found, it tried upper-case 0x0050 which belongs to MP2. And after I’ve finally made it work I discovered a fun bug in the reference decoder: while calculating cosine, the difference can overflow and thus the resulting value is somewhat wrong (and it could be easily fixed by changing “less or equal” condition to “less” in table search refinement step).

Anyway, it’s done and now I can forget about it.

4 Responses to “Looking at Voxware MetaVoice”

  1. Paul says:

    I can not forget about codecs, recently i found subtle bug in CDG decoder but have no time to fix it, link to sample is burried in one of gstreamer rust projects. Could you reimplement cdg video decoder in nihav?

  2. Kostya says:

    If you mean CD+G format then no, it’s entirely pointless (as you can expect from a karaoke subtitle format that is a special hack for audio CDs).

    And you’ve forgotten about Bink2 decoder, I’m still waiting for you to complete it (while other people are waiting for you to commit AC4 support).

  3. Paul says:

    I can not complete it, I lack your super powers, so I asked other dev and hoping she can finish/fix AC4 spectral replication aka aspx soon.
    Regarding Bink2 i wait for official open source decoder.

  4. Kostya says:

    I also lack super powers, it’s just I picked some experience over a decade of messing with codecs.

    And I understand why you gave up on ASPX, from a quick glance it looks a lot like AAC SBR which is not such fun thing to implement.

    As for Bink2, if you implement a decoder for it then it’ll be the official opensource decoder. I doubt that Epic will release one itself.