Decompilation Horror

In the old days I found PackBits (also DxTory) decoding routine monstrous. That Japanese codec had a single decoding function 349549 bytes long (0x1003DFC00x1009352D) and that was bad style in my opinion.

Well, what do you know? Recently I’ve looked at AMV3 codec. Its encode function is 445048 bytes long (0x10160C200x101CD698). And decode function? 1439210 bytes (0x100011500x1016073A)! I’ve seen many decoders smaller than that function alone.

There’s one thing common to those two codecs that might explain this — both DxTory/PackBits and AMV3 are Japanese codecs. It might be their programming practice (no, it’s not that bad) but remember that other codecs have crappy code too for other reasons. And some of them actually look better in compiled form than in source form (hello there, Ad*be and Micro$oft!). Yet I find it somewhat easier to deal with code that doesn’t frighten IDA (it refuses to show those functions in graph form because of too many nodes and maybe I’ll run decompiler on decode function in autumn – because it will keep my apartment warm till spring).

3 Responses to “Decompilation Horror”

  1. I’m thinking that the original code is overly #macro’d. A good example of this would be the early Duck TrueMotion codecs. Even when we had the original C source code, RE was extremely daunting because it was so hard to penetrate the macros.

  2. Kostya says:

    That is very likely (IIRC zlib or something similar also had an insanely large GETBITS macro) but I don’t think it was compiler that decided to make it all one function but rather a programmer.

    And TrueMotion was a fine example of enterprise code indeed — IIRC someone blogged about REing TrueMotion 2 from the same source package (it also included VP3, bleh).

  3. Marcus says:

    YES! LibPNG is a monstrosity.

    Maybe I’m weird because apparently everyone uses macros for damn near anything you could think of, but I REALLY try limiting that to constants that someone may want to tweak, and working around platform issues like how Windows doesn’t include strcasecmp, but uses _strnicmp instead.