FFhistory: early reverse engineers

What attracted me to FFmpeg as well as e.g. video hosting providers was its ability to decode various formats that were often tricky to decode even on their native platforms let alone in other circumstances (for instance, I remember the official Indeo 5 decoder freezing Windows system when trying to play perfectly valid Indeo 5 videos encoded with a beta Indeo 5 encoder). So let’s remember the names of those who made the project truly versatile.

First of all there should be a honorary mention of a man not related to the project directly—Mark Podlipec, the author of XAnim. He has managed to reverse engineer a good deal of simple yet popular formats back in the day and he also somehow managed to convince some companies to provide him with the source code for some of the popular codecs and make binary plugins for various platforms. His project was the real treasure trove of the decoders (both in source and binary form).

Most of the people in this list come from MPlayer and some other player projects like Xine. This list mentions only people who contributed before 2010 (it’s a rather arbitrary cut-off date though). This list also does not contain the name of Michael Niedermayer (responsible for REing some of H.263-based formats) since I’ve mentioned this fact in the post dedicated solely to him. And some of the people responsible for crucial decoders (like the initial Indeo 3 decoder or Sorenson Video 3 one) prefer to remain anonymous (because you never know who is going to sue you).

As with the previous list, if I don’t say anything about the person, consider him to be writing a decoder or two and maybe participating in some collective REing effort (like QDM2).

  • Alex Beregszaszi—one of the most prominent MPlayer developers responsible for a wide variety of decoders (let alone other work);
  • Ian Caulfield—IIRC he reverse engineered MLP (aka D*lby TrueHD), a very important lossless audio format for DVDs;
  • Reimar Döffinger;
  • Eli Friedman—I believe he reverse engineered Eidos Escape 130 codec;
  • Aurelien Jacobs—if I’m not mistaken, he took VP6 decoder (released by some anonymous guy) and ported it to FFmpeg while simultaneously adding VP5 decoding support to it as well;
  • Benjamin Larsson—this is a man who was not merely involved in reverse engineering codecs but also in organising it, mentoring other developers and finding people to help with the specific bits (like transforms and other DSP stuff). He also changed my life (in a good way) so when I visited Sweden I tried to meet him sometimes. He left the project very long time ago, moving to care about his family and radio-related hobbies. I wish him the best;
  • Mike Melanson aka The Multimedia Mike—now here’s a guy who deserves a separate post. Beside being involved in various reverse engineering projects, he also took care to document them. I still remember http://www.pcisys.net/~melanson/codecs/ that later grew up into The Wiki (and I even managed to co-author TSCC description there). But since this was not enough, he also created the test system for FFmpeg for automated regression testing on various platforms (it was later reworked from scratch and extended by Mâns but the idea remains the same). And if you haven’t noticed yet, he’s responsible for hosting this very blog (and I’m responsible for bringing down his server many times with the Slashdot effect);
  • Gregory Montoir—he’s mostly famous for reverse engineering game engines but while doing that he contributed some video decoders used in them;
  • Maxim Poliakovski—this guy did just reverse engineering work and did it perfectly. He’s responsible for reverse engineering many of formats from Indeo family, ATRAC formats and some other (like ProRes). Unlike some other people he tried to re-create the closest possible version of the original decoder while still making it understandable and not just translated assembly. The only downside of this approach is that it takes loooong time;
  • Peter Ross—while known primarily for his xvid work, he’s also reverse engineered a good deal of other formats, including Bink Audio, various formats from Electr*nic Arts and even VP7. I’m always eager to see what he has to deliver but sadly it does not happen every year;
  • Vitor Sessak—I mostly remember him for reverse engineering TwinVQ, before his work the only way to play VQFs on Linux was using some binary plugin for XMMS;
  • Kostya Shishkov—I was (and still is) mostly interested in learning out how the codecs work and wrote decoders mostly as a by-product (which affected their quality). I also touched all parts of FFmpeg and libav except for the filter system and audio resampling and I’m also responsible for introducing the cursed concept of packet side data;
  • Ewald Snel;
  • Sascha Sommer;
  • Roberto Togni;
  • Laszlo Torok—he’s responsible for Apple MACE decoder, I suppose. There’s no other information about him;
  • Daniel Verkamp;
  • Cyril Zorin—there’s a sad story related to him. He was referred to by ScummVM folks (or was it Mike?) in order to bring SMUSH/Insane decoder but he got disgusted by the atmosphere and left. So his work was later picked up and extended by me and even later by Paul B. Mahol (who will be discussed in some later post).

In the later years reverse engineering has not become easier, the improvement of reverse-engineering tools is compensated by the rising complexity of the software. Plus there’s a shift to the formats with open specifications (even if in some cases it takes hundreds of Swiss francs to open those specifications). So the interest in reverse engineering codecs is waning and people leaving for other things are not replaced by new ones. Still, there were people doing it later and they will be mentioned when the time comes.

5 Responses to “FFhistory: early reverse engineers”

  1. Whoo! Now I know I’ve made it when I’ve been name-dropped in a Kostya blog post! 🙂

    BTW, on the matter of reverse engineering, have you seen the recent item about G-3PO, the protocol droid for Ghidra (A Script that Solicits GPT-3 for Comments on Decompiled Code)? Might it make human reverse engineers obsolete?

  2. Kostya says:

    You fully deserve it to be mentioned in opensource multimedia history.

    As for those neural networks, so far they can only replace politicians and PR officers. Unlike many others I expect some version of Goodhart’s law to be in action: all those neural networks will produce output that conforms to some metric but not to the desired outcome (our prof once said that the problem with DB systems that take queries in human language is that humans often fail to formulate those queries in natural language). But of course neural networks will be used to decompile equally “wonderful” NN-generated code with varying results.

  3. […] Michael Niedermayer, the same talents to reverse engineer codecs as in many people mentioned in the post about them, the same diva behaviour as in Baptiste Coudurier, the same versatility as elenril, the same […]

  4. Lionel-Luna says:

    Hi, I did some research on ATRAC for a hobby project and am quiet stuck now after days of online research. I found some samples and bits of information here and there but nothing concrete or complete. If Maxim Poliakovski reverse engineered ATRAC and all Sony specifications referenced in various media standards just don’t exist on the internet and barely go beyond revisions of 1.X anyway – that means any full description of ATRAC or MD file formats is lost? Or did I miss something about that?
    Anyway, thanks to everyone trying to save some codecs – have fun!

  5. Kostya says:

    To the best of my knowledge, the Walkman/MiniDisc company is known for being very proprietary so I doubt they’ve released much information (maybe except in patents but good luck finding the relevant ones and deciphering them).

    So all information we know is essentially thanks to the group of the enthusiasts who hacked MiniDisc players in order to be able to transfer data from the media, those who reverse engineered formats and codecs (IIRC there was some PC software allowing working with them so it helped REing efforts).

    In theory we may still see the original documentation one day (maybe after yet another hackers’ breach and leak) but I’d not bet on it.