Weird Audio in VMD

Spoilers: as it turns out, a French company needs some Belgian technology in their life.

In last post I mentioned that NihAV can decode all of VMDs except maybe from the latest generation. Since I was provided with the samples I looked what changed there.

First of all the header got shorter. They’ve finally decided to throw out the initial palette which is not used by the 16- and 24-bit videos (that area is filled with what looks like some random excerpt from a DOS executable and it looks the same in all files I’ve seen) and adding two 16-bit fields at the end. What are those fields for? I’m getting to this.

Looking at the code for reading this new VMD header I could see that while old header was 816 bytes long, it handled header sizes 820 or 52 (820-768) which meant four additional bytes first added to the old header and then palette was thrown out. And the audio size was variable (previously it was 1:1, 1:2 or 1:4 fixed compression scheme but it seems to be not the case any more). In the files I have those additional fields at the end of the header were always 4 and 1152. The second number looks suspiciously like the number of samples for some audio codec and it turned out to be true.

It turns out that while Urban Runner encoded FMVs with Indeo 3 (and rest of the stuff with conventional 16- or 24-bit VMD video compression), those education games decided to use some external audio codecs. Studying the executable revealed that they load some external library depending on the audio type in the VMD header and use it for decoding. Those known types are:

  1. type 3 — L&H StreamTalk 15kbps @ 8kHz;
  2. type 4 — L&H StreamTalk 50kbps @ 22kHz;
  3. type 5 — L&H StreamTalk 25kbps @ 11kHz;
  4. type 6 — L&H StreamTalk CELP 4.8kbps @ 8kHz.

Since only type 4 is present in the audio data and st500f22.dll is present in the binaries the names for types 3 and 5 are interpolated from it (I could find st48w.dll for type 6 on some random DLL download site though).

As you can see from the name, Coktel Vision (a French company) decided to use codecs from Lernout&Hauspie (a Belgian company) but you knew that from the spoiler in the beginning.

For completeness sake I had to loop at the codec so that my VMD decoding would be complete (yes, I’ve mentioned three other codecs but I don’t think they’ve actually used them in production but if they did it should not be that hard to add support for them as well).

This StreamTalk 50kbps @ 22kHz codec turned out to be a simplified version of MPEG Audio Layer II. While it builds on all the sample principles, the frames are variable in size and don’t have headers. Actually they start at the next bit after the previous frame and if you have frame data stored in separate chunks (like in VMD) it will skip some bits from the first input byte if previous frame had some free bits left (yes, that means repeating last byte of the previous packet as the first byte of the current packet).

The content is the usual 36 sets of 32 sub-band samples (giving 1152 samples per frame) with bit allocation parameters and scales coded for each third of the frame, each three sub-band samples coded together. Synthesis is performed with the same QMF as in MP2 too. The main difference is that there’s only one bit allocation class and frames do not have fixed length and can be anything between 16.5 and 517 bytes long (yes, the reference decoder checks input length against those two values).

Overall this turned out to be a simple codec with familiar concepts (the only thing I had troubles with was that continuous decoding with skipping bits in the first byte of input) and I guess two other codecs are the same but use a different bit allocation class. CELP codec is something different but it looks not that complex either so it can be done in reasonable time if the need arises (hopefully never though).

Comments are closed.