Looking at XVD

A week ago a certain XviD developer made a request to look at something more compressed called XVD and so I did.

For those who do not know (which included me), there’s a whole bunch of various formats originally developed by a certain lab formed in 1993 inside Saint Petersburg State University of Aerospace Instrumentation to work on multimedia compression and related things. Just look at those people:

So they developed various formats and sold them initially to Alaris, it formed a spin-off company called Right Bits, later the technology got to DigitalStream-USA company (no idea if it was another company reorganisation or it just formed anew using the same people) and finally to XVD Corporation (no idea if there’s an open-source DVX codec around based on Project Ketchup).

Anyway, here’s a list of the formats (in all cased VG in the names stands for “videogram” as the technologies were intended to send short video messages from one user to another):

  • the original VGM container;
  • VGM2 container;
  • TELP audio codec (yet another of those LPC-based speech codecs);
  • Muzip audio codec (FFT-based one with several flavours);
  • V2K-II video codec;
  • and finally XVD video codec.

I don’t know whether Alaris VGPixel is related to this but I’ll dig into it later to see if it has the same features or design.

For some of those codecs there is a demonstration version in form of Java applets preserved by one of the original developers on his site. Those versions lack many features of the actual codecs but still might be a good source of some information.

Let’s see what information I got about the codecs.

TELP codec is not interesting to me, so skip.

Muzip need further looking but so far it looks like a typical lossy codec employing FFT, overlapping frames and parametric bit allocation (and arithmetic coder).

V2K-II seems to be a predecessor of XVD since it has many of the coding blocks already there plus some rudiments of it can be seen in XVD while they are not in use any longer. Here is the short list of its features:

  • intra-frames can be coded using wavelet compression. The code gets somewhat confusing there but looks like it codes wavelet coefficients using bit-slicing (i.e. coding first highest bit of all coefficients, then all next-to-highest bit of all coefficients) and binary run coding. Then bands are combined using LGT 5/3 with only two levels used (or maybe just one);
  • normally thought the frames are coded using the traditional H.263 design (8×8 DCT, intra/inter macroblocks, halfpel motion compensation) but with different coding;
  • actual data is coded in segmented way: first you have macroblock types, then motion vectors, and finally block data per plane. Which also has first block coded flags transmitted first for the whole plane and only then actual block coefficients;
  • motion vectors seem to be coded hierarchically instead of predicted from neighbours: you have a couple of global motion vectors to choose from (they’re coded in the beginning of the frame and each non-skip inter macroblock has an index coded) plus macroblock motion vector plus single block motion vector;
  • while block coefficients are coded using two codebooks (for inter and intra blocks), most of the block/macroblock metadata is coded using arithmetic coder with context-adaptive model selection. Essentially it takes left, top-left, top and in some cases (if I’m not mistaken) a value from the previous frame, forms a context, selects a model, decodes value—and then uses the decoded value along with the other context values to see what the final output value should be;
  • or it can skip coding frame entirely and just fade last decoded frame for a given number of output frames;
  • and finally a fun postprocessing mode. In addition to the usual deblocking it can add a pseudorandom noise by cyclically adding a 1-kilobyte table with -1/0/1 generated at the start. And in order to make it more random it starts with a position in that table equal to the first pixel value of the plane.

And now XVD. It keeps many ideas from the predecessors and adds a couple of new ones. It dropped wavelet coding completely but picked up a new way to code existing data. And looks like it finally got interlaced support in the form of coding two independent frames as fields.

The new coding method is context-adaptive binary arithmetic coder. While the main principles are the same it’s not exact copy of H.26x CABAC. Instead you have model represented by MPS probability which gets updated depending on decoding bit and update factor (set for model in the beginning). Fun fact: initial model parameters (frequency, MPS bit and update shift) are stored as doubles and converted to integers during coder (re)initialisation.

Motion vectors now are coded in three ways: codebooks, with arithmetic coder (the only option in the previous codec) or with binary coder (and that method also uses MV prediction now).

Another new thing is alternative quantiser. If it’s present then there’s another array of flags telling which quantiser should the macroblock use.

Intra block DCs finally can have prediction enabled for them. As for the coefficients, they can be coded in the old way or with binary coder using unary codes with limited models (i.e. first three bits use three different models, all subsequent bits are coded using the same model).

Overall, XVD is something like a transition codec from H.263 to H.264 in design and features (no B-frames though) plus some other unorthodox decisions (like mixing block flags coded with arithmetic coder and ordinary variable-length codes for coefficients in the same stream; or adding wavelets and third data coding method). I’m grateful to Peter R. for pointing me to something more original than the usual stuff.

P.S. This is just a review of the technologies, I might write actual demuxers and decoders for NihAV later. And document it all of course.

13 Responses to “Looking at XVD”

  1. Peter says:

    Great write up. For some reason I assumed it was of Japanese origin.

    Alaris Videogram Creator and Streaming Player are bundled with this Windows 9x camera driver: https://www.philips.ie/c-p/PCA635VC/-

  2. Paul says:

    Fix bink2 for me!
    Also RE their audio codec!

  3. Kostya says:

    As I said to Derek before, I don’t want to do my job instead of you. Fix Bink2 decoder yourself.

    As for the audio codec—might happen, which one do you mean?

  4. […] Kostya's Rants around Multimedia « Looking at XVD […]

  5. Paul says:

    Audio codec is one from their binary stuff that is on same download page as bink decoders/players.

  6. Kostya says:

    I’m pretty sure they don’t have any other codec. All that is mentioned on their site are Bink/Smacker (no new codecs there), Oodle (which is collection of general purpose data compression algorithms, I’ve seen REd version in Internet) and tools for building game engines (one of them, Miles Sound System, supports various audio formats but nothing unknown).

  7. Paul says:

    Yes, that miles thing also have that audio codec, among dct and rdft.

  8. Paul says:

    It is called by something MSS. In .binka container with 1FCB as first hex data. From some versions of minecraft. Comes with binka2wav program.

  9. Kostya says:

    Interesting. Unfortunately it requires binkawin.asi component you can find in those games which I don’t have and can’t find either.

    If somebody has enough interest to share msi.dll along with that component and maybe some samples then I can look at it.

  10. Raishi says:

    Titanfall 2 and Apex Legends (especially the latter since it’s a live-service game and is therefore constantly updating) are using an updated verison of MSS.
    for now though i can only provide Titanfall 2 binaries. will get back to you when i can.

  11. Kostya says:

    Thanks, waiting for it.

  12. Raishi says:

    binaries which handle this “1FCB” format are in binkawin64.dll and milesinw64.dll
    please don’t screw this up.

  13. Kostya says:

    Got it, thank you very much. I’ll try to look at it tonight.

    And don’t worry, unless there’s a targeted disinformation campaign from certain company we’ll not know less about it.