One cannot be called a true reverse engineer unless he tried (and failed) REing Italian literature collection. I’ve finally tried it (and, obviously, failed).
What’s so special about it? Here is Mike’s description. From what I’ve seen on the first CD videos occupy 280MB out of total maybe 300MB (and over 200MB of it is a single tutorial video). While the actual library data occupies about five megabytes there.
The main library application is written in Visual Basic 4 (16-bit version) and it’s not a compiled version but rather P-code and I’ve failed to find a decompiler for that exact version (32-bit? seems no problem; 16-bit or even 32-bit Visual Basic 3? also no problem; 16-bit VB3? keep searching). There are some utility apps of unknown purpose there written in Borland Delphi (also 16-bit and I’m pretty sure it was simply Borland Delphi then, no additional versions needed). And while those are in sane machine code (well, 16-bit x86 machine code is hardly sane but manageable) there’s a lot of Delphi cruft compiled in with TThis
and TThat
and TOtherThing
and such (plus additions in Italian).
Despite files having extension .LZ[1-3]
I doubt they employ any kind of Lempel-Ziv compression, I’d expect some different dictionary-based scheme (you have an index with all possible words after all). And looks like they’ve licensed some DBT thing (obviously it stands for Text DataBase in Italian) from some Italian Institute of Computational Linguistics and this DBT is responsible for the file formats but I’m too lazy to RE those half-megabyte .dlls without a decompiler (written in Delphi too).