decompilation resources and theory
Asked Answered
S

3

6

There must be a million of books and papers on the theory and techniques of building compilers. Are there any resources on doing the reverse? Im not interested in any particular HW platform. Looking for good books/research papers that examine the subject and difficulties in depth.

Seeley answered 27/4, 2011 at 17:42 Comment(1)
program-transformation.org/Transform/DeCompilationBolshevik
S
2

I've worked on an AS3 and Java decompiler and I can assure you that everything I've learned in regards to decompilation is straight from compiler theory. Intermediate representations, data flow analysis, term rewriting, and other related concepts can all be found in the dragon book.

Sexist answered 2/7, 2011 at 18:17 Comment(1)
Although there is a bit that is the same there is a bit that is different. For example the dragon book doesn't say much about the Earley algorithm and in fact seems suggest not using it. But decompilation is inherently ambiguous so you really do want a full context-free parser. Again, my experiences on this are at rocky.github.io/Deparsing-Paper.pdfKickoff
K
2

I've written about decompilers for dynamic languages here and for Python specifically.

Note though this is for dynamic languages with custom (high-level) VMs.

Kickoff answered 15/2, 2018 at 17:56 Comment(0)
P
1

Decompilation is really a misnomer. Decompilers compile object code into a source representation. In many ways they are easier to write than traditional compilers - the 'source' code is already syntax checked and usually very precisely formatted.

They build up a symbol table (of addresses) and construct a target language representation of the application. The usual difficulty is that the original compiler has to a greater or lesser degree optimised the original application by removing common sub-expressions, hoisting constant code out of loops and many other similar techniques. These are often not possible to represent in the target language.

In cases where the source is for a well defined VM, then often this optimisation is left to the JIT compiler and the resulting decompiled code is very readable - in many cases almost identical to the original. Compilers of this type often leave some or all of the symbols in the object code allowing these to be recovered. Others include line numbers to help with debugging and troubleshooting. These all help to recover the original code.

As a counter, there are code obfuscators that deliberately perform transformations to the code that prevent simple restoration of the original source by scrambling names, change the sequence code is generated (without changing its resulting meaning) and introducing constructs for which there is no source language equivalent.

Pressor answered 27/4, 2011 at 19:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.