Frames consist of ordinary macroblocks in 420 format with optional alpha. Bitstream is packed into 32-bit little-endian words.
Every macroblock is prefixed by 2-bit code determining its type.
Type 0 (intra block). Decode intra block.
Type 1 (skip block). Simply skip block.
Type 2 (motion block). Decode motion vectors and copy block with ½-pel precision for luma and alpha and ¼-pel precision for chroma. There is no residue coding.
Type 3 (inter block). Decode motion vectors and copy block with ½-pel precision for luma and alpha and ¼-pel precision for chroma. Now decode and add residue.
Motion vector differences are coded this way: motion vector element size in bits (3 bits, if read value is 7 then read additional 2 bits, so total is up to 7+3=10 bits per MV element); four motion vector elements; sign bits for non-zero motion vector elements.
Side note: looks like this codec employs this scheme (bit size in 3+2 scheme, fixed-size values, signs for nonzero values) for other elements too, e.g. DCs.
Luma hpel filter looks something like (A - 4*B + 19*C - 4*D + E + 1) >> 5
Chroma qpel: ¼ — (6*A + 2*B + 1) >> 3
, ½ — (A + B + 1) >> 1
Next: intra macroblock decoding — luma.