So I’ve looked at the beast again. It seems to be close enough to the original TwinVQ (as in .VQF, not something that got into MPEG-4 Part 3 Subpart 4 Variation 3), so I’ll just try to document spotted differences.
Coding modes. Original TwinVQ had 9 modes, VoxWare has twice as much (and so twice as much codebooks!). One of them is very special (8kHz at 6kbps), with a cutoff of “high” frequencies. Also mode explicitly signals the number of channels so some modes are stereo-only and some are mono-only.
Bitstream format differences. Bitstream is packed LSB, the first byte is not skipped by the decoder. There’s an additional 2-bit variable right after window type present in many modes (but not 8kHz@6kbps or when short windows are used), my guess is that it has something to do with intensity stereo. Some parts order seems to be shuffled (i.e. original TwinVQ used p_coef, g_coef, shape
order, MetaSound uses p_coef, shape, g_coef
order).
Reconstruction. I’m not familiar with TwinVQ much myself but it looks like there are some small differences there as well. For instance, pgain base is 25000 for mono and 20000 for stereo and in bark decoding scales 0.5, 0.4, 0.35
are used instead of 0.4, 0.35, 0.28
(not really sure about that bit).
Any help with the decoder is welcome — new decoder will reuse most of the current TwinVQ decoder after all and new tables (it should take the title of decoder with the biggest tables from DCA decoder).
Hello, and nice to know you are looking at MetaSound 🙂
Just one thing I’d like to add: VQF-flavored TwinVQ does not ignore the first byte in the frame. Actually, this first byte is added by the demuxer because the frame boundaries are not necessarily aligned to a byte boundary – ie, a frame might end in the second bit of a byte with the next frame starting on the third bit of the same byte. Thus, the demuxer add some data in the beginning of the frame to allow the decoder to handle that.