Here I’d like to review two formats related to the messages in the game: font files and message files. There’s still a question of how lipsync files work (so far it looks like a series of 16-bit variables probably telling which sprite from corresponding GRA
file to show; I may get to that eventually). Update: as one should expect, lipsync format is a lot like it was in SCI—a series of 16-bit time positions (in tertias aka 1/60th of a second) and sprite IDs. Additionally there’s another table of equal size right after the end of that data, its meaning still unknown.
QGF
This is a rather simple bitmap font format. It starts with a header that has the following 32-bit values: maximum character width, character height, space between characters, unknown value, flag telling whether it is a complex pseudo-3D font or a simple line font, another flag (probably for a shadow).
Then there is an array of 512 bytes containing the character widths (only 16-256 range is populated though) followed by an array of 512 32-bit words telling the offset to the character data.
Character data consists of pair of bytes. If the first byte has negative value, it tells you how many pixels to skip, otherwise it is the current pixel opacity (for the fonts with the corresponding flag set where 31 = fully opaque pixel, for simple fonts any non-negative value is an opaque pixel). The second byte value seems to be completely ignored.
QGM
These files contain text messages of various kinds: text spoken by characters, narrator’s text, dialogue options and even all the text in user interface.
Message files start with " MGQ"
magic, 32-bit file version (only version 3 and 4 are supported), 32-bit number of message blocks in the file, some unknown 16-bit value and 16-bit file ID (e.g. 160 for 160.QGM
).
Then message blocks with 32-bit header and variable-size payload follow. Message block header starts with four 16-bit variables that are used as message identifier (and also used to generate the name for speech/lipsync files, more on that below). They are followed by four unknown 16-bit fields (first of them is probably ID of the character saying the lines), 16-bit number of dialogue options, 16-bit flags (flag 4 means the message contents are obfuscated), another unknown 16-bit field, 16-bit internal message number (unordered), 16-bit message length and 16-bit flag for string message label presence.
The header is followed by optional 13-byte message label (which looks like a filename), dialogue options in the same format (if present) and optional message text. There’s an additional 32-bit number at the end of each message block with unclear meaning which may be related to the message ID).
Before going on message obfuscation scheme and name generation I’d like to talk about the structure. There are essentially two kinds of message types: normal text and dialogue trees that mostly contain links to the other message text.
For example, in the same Arcane Island location message block 2 looks like this:
- option 1 =
A0BJ020S.021
- option 2 =
A0BJ0208.0E1
- option 3 =
A0BJ0208.0F1
- option 4 =
A0BJ0208.0G1
- option 5 =
A0BJ0208.0H1
- message = “He that wishes to pass through me,
First must answer questions three.”
And those IDs point at the other message blocks:
- message block 0 with text ‘Say “%s.”‘ (i.e. hero’s name)
- message block 13 with text ‘Say “King Arthur of Pendragon.”‘
- message block 15 with text ‘Say “Putentane.”‘
- message block 16 with text ‘Say “Sir Robin-the-Not-So-Brave.”‘
- message block 17 with text ‘Say “Oh, no, not again.”‘
(Also those message blocks have optional message label set to the ID of the message block 2, probably for the easier return. In other files a character’s reply may have the same label set as well.)
Those familiar with the game may remember it as the first question the cloud gargoyle asks the hero and possible replies to it.
Now about the message IDs. Those are generated from the QGM ID and four integers using the following format: A(3-character QGM ID)(2-character ID1)(2-characted ID2).(2-characted ID3)(1-character ID4)
. Integers are converted using base 36 i.e. numbers and uppercase letters e.g. 415 gets converted to 0 11 19
and coded as 0BJ
. If audio part is present, it has the same name. Lipsync data is the same as well but uses 'S'
as the first letter in the file name instead.
And as for the message text obfuscation, if the corresponding field has bit 2 (like in the majority of the message files), then it should be de-obfuscated using the following algorithm:
- split data into 4-byte chunks and tail 0-3 bytes long;
- for each 4-byte chunk repeat steps 3-6:
- read those bytes as 32-bit little-endian number;
- exclusive-or the value with constant
0xf1acc1d
;
- rotate value cyclically left by 15 bits;
- store 32-bit number back as four bytes;
- invert bits in all bytes of the tail.
I’d call this scheme lame but the constant speaks for itself.
And as bonus for those who care here are the extracted font files (under the cut):
(more…)