Ghidra and DPMI

When trying to look inside an old game you might have a problem if the executable is 32-bit one running with some DOS extender.

Most people do not know (and those who know probably want to forget it) that beside the usual 16-bit COMs and EXEs (that can have several segments and sometimes an overlay too—an external file for loading code or data from disk) you can have 32-bit DOS applications with a very special interface to access so called protected mode. Essentially you have two executables bundled together: actual 32-bit executable in its own format and the loader for it (or a stub to call an external loader usually called dos4gw.exe). Kinda like PE format but more flexible.

So what’s the problem with loading them? For starters Ghidra does not recognize the extended format and loads just the loader (an external LX loader plugin did not help either; but when I tried to load some ActiveX components for XVD decoding, their format was detected as LX). Another problem is that some extenders (like the one from CauseWay) also compress the payload. Luckily enough there’s DOS/32A extended that is shipped with a utility to extract the so-called linear executable which you can’t run but you can study it.

And this is not hard but somewhat tedious since (at least in my cases) it can be loaded only as raw image. From the LX header you can extract the position of actual segment data, segment sizes and their base addresses; then you can use that information to create custom memory segments for them. Now you have some (usually two) segments and one problem: the segments do not address the full virtual address but rather an offset e.g. if you have code segment starting at 0x10000 and data segment starting at 0x90000 you might have mov eax, [0x4bc] which really should be mov eax, [0x904bc]. That is why the first hundred kilobytes or more of LX is dedicated to so-called “fixups” that tell you how you should correct the addresses at the specific places in loaded data (or code). Luckily Ghidra allows patching the addresses in the instructions and re-assigning segment base addresses to have less places to patch (and calls/jumps almost always use relative offsets so it’s not an issue).

And how to find those wrong addresses for specific data item to patch? Memory search for low word part of the address works just fine.

Overall, maybe it is not the most convenient way to deal with this format but it works sufficiently good for me. Who knows, maybe in the future I’ll look at more games for not yet covered formats. At least now I know how to do that.

Comments are closed.