What is a de-compiler how does it work? [closed]

Asked 25/5, 2010 at 4:48 Answered 30/7, 2012 at 13:19

So is a decompiler really a thing that gives gives the source of a compiled/interpreted piece of code? Because to me that sounds impossible. How would you get the names of the functions, variables, classes, etc if it is compiled. Or am I misinterpreting the definition? How does it work? And what is the general principal behind making one?

Overture answered 25/5, 2010 at 4:48 Comment(0)

You're right about your definition of a decompiler: it takes a compiled application and produces source code to match. However, it does not in most cases know the name and structure of variables/functions/classes--it just guesses. It analyzes the flow of the program and tries to find a way to represent that flow through a certain programming language, typically C. However, because the programming language of choice (C, in this example) is often at a higher level than the state of the underlying program (a binary executable), some parts of the program might be impossible to represent accurately; in this case, the decompiler would fail and you would need to use a disassembler. This is why many people like to obfuscate their code: it makes it much harder for decompilers to open it.

Building a decompiler is not a simple task. Basically, you have to take the application that you are decompiling (be it an executable or some other form of compiled application) and parse it into some kind of tree you can work with in memory. You would then analyze the flow of the program and try to find patters that might suggest that an if statement/variable/function/etc was used in a certain location in the code. It's all really just a guessing game: you'd have to know the patterns that the compiler makes in compiled code, then search for those patterns and replace them with equivalent human-readable source code.

This is all much simpler for higher-level programs like Java or .NET, where you don't have to deal with assembly instructions, and things like variables are mostly taken care of for you. There, you don't have to guess as much as just directly translate. You might not have exact variable/method names, but you can at least deduce the program structure fairly easily.

Disclaimer: I have never written a decompiler and thus don't know every detail of what I'm talking about. If you are really interested in writing a decompiler, you should get a book on the topic.

Aryanize answered 25/5, 2010 at 4:50 Comment(4)

What is the general principal behind making one though? – Overture 25/5, 2010 at 4:51

Actually, you can often get the variable names for Java and unstripped gcc debug-ready executables. – Drunken 25/5, 2010 at 4:55

I'm wondering "how do you make one". Sorry, I'm kinda slow. – Overture 25/5, 2010 at 4:59

@thyrgle: I edited my answer and gave you a more detailed explanation. Please note, though, that I've never written a decompiler and I don't know enough to actually get you started in writing one, but I know enough to give you an overall idea of how they work. – Aryanize 25/5, 2010 at 5:10

A decompiler basically takes the machine code and reverts it back to the language it was formatted in. If I'm not mistaken, I think the decompiler needs to know what language it was compiled in, otherwise it won't work.

The basic purpose of the decompiler is to get back to your source code; for example, one time my Java file got corrupted and the only thing I could so to bring it back was by using a decompiler (since the class file wasn't corrupted).

Boson answered 25/5, 2010 at 4:52 Comment(0)

It works by deducing a "reasonable" (based on some heuristics) representation of what's in the object code. The degree of resemblance between what it produces and what was originally there tends to depend heavily upon how much information is contained in binary it starts from. If you start with basically a "pure" binary, it's generally stuck with just making up "reasonable" names for the variables, such as using things like i, j and k for loop indexes, and longer names for most others.

On the other hand, a language that supports introspection needs to embed a great deal more information about variable names, types, etc., into the executable. In a case like this, decompiling can produce something much closer to the original, such as typically retaining the original names for functions, variables, etc. In such a case, the decompiler can often produce something quite similar to the original -- possibly losing little more than formatting and comments.

Frederiksen answered 25/5, 2010 at 4:56 Comment(0)

That depends on what language you are decompiling. If you are decompiling something like C or C++, then the only information provided to you is function names and arguments (In DLLs). If you are dealing with java, then the compiler usually inserts line numbers, variable names, field and method names, and so on. If there are no variable names, then you would get names like localInt1, localInt2, localException1. Or whatever the compiler is. And it can tell the spacing between lines, because of the line numbers.

Malissa answered 30/7, 2012 at 13:19 Comment(0)

Recommended topics

Hot tags