Looking at SMUSH/INSANE formats

As some of you might know, I had an interest for various game formats for decades (and that’s one of the reasons that brought me into opensource multimedia). And those formats include videos from LucasArts games as well. Actually SMUSH is not an ordinary video format but rather a sub-engine where both audio and video are objects (background, sprites, main audio, sound effects) that should be composed into final audiovisual experience. INSANE is the next iteration of the engine that became simpler (coding full frames, only one object per frame, just one codec, 16-bit video instead of paletted one) but it shares a lot in common with its predecessor.

As expected, the main source of information about those come from ScummVM (and one of their developers made smushplay to play the files in stand-alone matter). There’s a personal story related to that: one Cyril Zorin meddled with some formats from LucasArts games and wanted to add INSANE support (for Grim Fandango but it’s the same for all other games using SNM format) in FFmpeg, sadly he could not stomach review process there (which is hard to blame him for) and abandoned it; some time later I picked it up, added support for SMUSH codecs 37 and 47 (the ones used in adventure games) and got it committed; years later Paul B. Mahol (of future Bink2 decoder fame) added VIMA audio support to it.

Yet there are more games out there and some of them use different codecs, for which details were not previously known. So I decided to finally reverse engineer them to see how the development went. My implementation in NihAV is far from being perfect (there are many issues with transparency and coordinates) but it can decode all files I could encounter with very few exceptions.

So, let’s look at the codecs used for image coding. Audio is rather boring: there’s very old PCM format in SAUD chunks, scaled PCM audio in IACT chunks and VIMA is IMA ADPCM with 2-7 bits per step.

Let’s start with SMUSH v1 that’s encountered only in Rebel Assault. It is also the most complex since it has all possible complexities: backgrounds larger than video size, sprites at negative coordinates, some codecs modifying existing pixel values (instead of overwriting them or leaving them intact). Truly it’s the wrong engine got named INSANE. It includes codecs 1-5, 21 and 23 (plus 33/34 as an alias for codecs 4 and 5).

Then it was the time for SMUSH v2, usually with just codec 37 for the old VGA games and codec 47 for larger ones (though Mortimer and the Riddles of the Medallion features an updated codec 23). And Jedi Knight: Mysteries of the Sith features codec 48.

INSANE (as mentioned in the very beginning) has only one codec called Blocky16 and by the description in Indiana Jones and the Infernal Machine executable, they obviously based it on codec 47 (with a bit of codec 48), so let’s see what codecs are known and how they work:

  • codec 1 (and 3)—this one is a simple RLE codec: each line is preceded by the line data size, one byte packs operation into bottom bit (0 – copy, 1 – run) and operation length minus one into top bits, and following 1-128 bytes contain pixel values, rinse, repeat;
  • codec 2—this one is curious since it codes sparse pixels: codec data consists of four byte records with two bytes for X offset from the last point, one byte with Y offset from the last point and new pixel value;
  • codec 4 (or 33)—this one is glyph-based. The image is divided into 4×4 blocks dubbed glyphs (I don’t know if it’s the name the original developers used; maybe it comes from the original way of coding images as it was done on C64 and such). There are 256 predefined glyphs (that can use up to 16 colours) and there may be up to 256 additional glyphs in the frame. In the latter case each column data (yes, in this mode image is coded in columns) in interspersed with bit masks (packed into byte) telling which glyph set to use for the next byte index. Additionally in this case glyph 128 from the default set means skip block;
  • codec 5 (or 34)—this one if very similar to the previous codec and differs only in two aspects: glyphs are generated differently and there’s no skip glyph this time;
  • codec 20—looks like this is plain raw data. And despite the number, the codec is used in the remastered version of Full Throttle and not in any of the original games;
  • codec 21 (also 44 in NUT)—this is a simple codec for updating lines: first you have 16-bit line data size then line data in form of 16-bit skip length, 16-bit copy length minus one plus pixels to copy, 16-bit skip length, 16-bit copy length minus one and so on;
  • codec 23—this one modifies pixels instead of replacing them. The format is similar to codec 21 but it’s just interleaved skip/modify lengths stored in single bytes. In Rebel Assault modification was done by adding a constant value provided in frame object header, in Mortimer frame object data may start with 256-byte translation table;
  • codec 37—this is the first of the codecs to have its own sub-codecs:
    • subcodec 0—raw data;
    • subcodec 1—RLE-compressed data (like in the original codec 1) for the frame split into 4×4 blocks with the following meaning: 0xFF – copy 16 bytes from the streams for the block, otherwise use motion vector from the table to copy data from the previous frame;
    • subcodec 2—RLE-compressed frame data;
    • subcodecs 3 and 4—similar to subcodec 1 but without RLE and if a flag set it has additional opcodes 0xFD and 0xFE for filiing block with 1 or 4 colours. Subcodec 4 additionally treats opcode 0 as a beginning of skip run.
  • codec 47—this one develops on the ideas from codec 37 and introduces second reference buffer (plus a special code to swap all three buffers in different ways). It also can have a special interpolation table that gives values that should be averages of two byte values (this table is used to upscale certain modes). The subcodecs are:
    • subcodec 0—raw data;
    • subcodec 1—raw data downscaled twice;
    • subcodec 2—more complex version of codec 37 subcodec 3 but on 8×8 blocks and with more special opcodes. There are opcodes for filling block with predefined colour, now you can also paint glyphs (like in codecs 4 and 5 but glyphs here are just patterns for two colours) and opcode 0xFF signals that you should split block and process each quarter recursively (down to 2×2 size);
    • subcodecs 3 and 4—copy one of two reference buffers to output;
    • subcodec 5—RLE-compressed frame data (like codec 37 subcodec 2).
  • codec 48—this seems like an experimental development of codec 47 that was mostly abandoned. It has the following subcodecs:
    • subcodec 0—raw data;
    • subcodec 2—RLE-compressed frame data (the same as in previous codecs);
    • subcodec 3—a new 8×8 block based compression. While the block can’t be now processed recursively, it can have up to 16 motion vectors (one for whole 8×8 block or 4/16 each 4×4 or 2×2 block, each can be coded either as fixed table index or as an explicit 16-bit offset) or it can code 4×4 block with 1/4/16 pixels (which should be interpolated using the table from the header) that should be scaled twice on output;
    • subcodec 5—RLE-compressed downscaled data (like codec 47 subcodec 1).
  • blocky 16—the last incarnation of the codecs that is mostly based on coded 47 (but it took explicit motion offset from codec 48). Frame header has 256-pixel codebook for most common pixels so some subcodecs/opcodes can use it to transmit just one byte index instead of full 16-bit pixel. There are the following subcodecs:
    • subcodec 0—raw pixels;
    • subcodec 1—raw pixels downsampled twice (interpolation is now done with normal averaging instead of a table);
    • subcodec 2—8×8 block based compression a lot like codec 47 subcodec 2 but with some opcodes working with raw pixels and others working with codebook indices (e.g. raw block data, fill or glyph drawing);
    • subcodec 3 and 4—copy one of the reference frames;
    • subcodec 5—RLE-compressed frame data (RLE is the same as in previous codecs but you need to reinterpret data as 16-bit pixels);
    • subcodec 6—frame data in form of codebook indices;
    • subcodec 7—frame data in form of codebook indices downscaled twice;
    • subcodec 8—RLE-compressed codebook indices for the whole frame.

As you can see, SMUSH (and INSANE) employed a wide variety of compression methods, from RLE to vector quantisation. Later LucasArts games compressed videos with equally interesting codec that also added DCT into the mix but Bink was not developed by them.

Overall, even if it was not that useful for me, I learned a bit more about rather interesting multimedia format so I consider this time well wasted. Now I shall finish audio decoding and probably move to writing encoders that nobody will use.

2 Responses to “Looking at SMUSH/INSANE formats”

  1. Paul says:

    There is some interest about full featured 7.1 EAC3 encoder, full featured TrueHD encoder, so consider adding it.

  2. Kostya says:

    Both formats belong to D*lby so I’d rather not touch them ever.

    And IIRC extending current EAC3 decoder to support 7.1 is easy if you figure out how to deal with the dependent frames that carry those additional channels (i.e. how to make demuxers/parsers clump them together, decoding them is no different from decoding the base ones).