One of the issues with On2 VPx family is that they started it from VP3 while having four different TrueMotion codecs before that (it’s like the company was called Valve and not Duck at that time). But I wanted to look at some lossless audio codecs and there’s VocPack or VP for short which has versions 1 and 2. Bingo!
This is a very old lossless audio codec that appeared in 1993 along with Shorten and, as it turns out, originated the second approach to lossless audio compression. While Shorten was a simple format oriented on fast decoding and thus used fixed prediction (either LPC filter or even fixed prediction scheme) and Rice codes for residues (the same scheme used in FLAC and TAK), VocPack employed adaptive filter and arithmetic coding (the approach carried by LA, Monkey’s Audio, OptimFROG and such). And it was made for DOS and 8-bit audio! Well, version 2 added support for 16-bit but it seems to compress only high 8 bits of the sample anyway while transmitting low bits verbatim.
And it turned out to be my first real experience of using Ghidra with DOS executables. The main troubles were identifying library functions and dealing with pointers. Since it was compiled with Borland C++ 3.0 (who doesn’t remember it?) it was rather easy to decompile but library functions were not recognized (DOS executables don’t get much love these days…) but searching the disassembly for int 21h
with Ralf Brown’s interrupts list at hand it was easy to identify calls for file operations (open/read/write/seek) and from those infer the stdio
library functions using them and finally the code using all those getc()
s. And of course segmented model makes decompiling fun, especially when decompiler can’t understand segment/offset variables being used separately. In result sometimes you recognize offset but you have to look at the data segment yourself to see what it refers to; even worse, for some local variables Ghidra seemed to assume wrong segment which resulted in variables in disassembly and decompiled output pointing to non-existent locations. Despite all of that it was rather easy to understand what unpacker for VP1 does. VP2 has only packer and no unpacker publicly available (feel free to trace the author and buy a copy from him that supports unpacking) plus it depends on those wrongly understood global variables more which prevented me from understanding how encoding a residue works there. In theory you should be able to set data segment manually but I don’t see a point on spending more than a couple of hours on REing the format.
It was a nice distraction though.
I remember old times when dealing with this codec, long ago I wanted to RE it and asked you for help. But I got stuck.
From what I can see we discussed it eight years ago. Back then REC probably was the best decompiler so if it’s not working for you then disassembly it is.
As you can see the tooling has improved since then.
I haven’t looked at Ghidra that much, but do they also have a collection of library function signatures like IDA does? And if so, can you augment it for a specific compiler if you have access to that compiler?
It seems to have it (the feature is called “function ID”) and you can build your own IDs as well but it ships only VS1999-2015 function IDs. No love for DOS compilers 🙁