Compilation is a lossy process, so it is not, in general, possible to decompile an executable (or other compiled program module, such as a .so
or .dll
) and recover source code in the original language or even unambiguously determine what the original language was. It is not even necessarily the case that there is only one original source-code language, since it is possible that, before linking, different modules were written in different languages. Ordinarily, you can, disassemble a binary and recover assembly language, although that may be of very limited value.
In many cases, you can tell something about the original language provided that the binary has not been stripped (of symbols). For example, you can usually tell if a binary was originally written in C++ by looking at the symbols in the binary (on Linux, using objdump
, no idea what the equivalent might be on Windows): C++ symbols are mangled in a particular way. It's not a 100% guarantee, but a high likelihood.
That said, some decompilers do a pretty reasonable job of a very difficult task. Inferring likely high-level constructs from a binary is not easy. In my (very limited) experience, they tend to work for fairly trivial programs or for software compiled with a narrow range of versions of the original compiler, but choke on anything substantial: it's very difficult for the author of a decompiler to keep up with changes in the compilers, and there may be very little incentive for her to do so.
Even in cases where decompilation is very successful, the result is essentially completely uncommented code with meaningless variable names that is extremely difficult to understand. Decompilation is one thing, extracting the intended semantic meaning from the result is another. Remember that many variables, branches, loops, and functions will have been completely optimized away, many functions will have been inlined, etc. So the “source code”, even if you can obtain it in this way, may not be a whole lot of use to you.