You can't recover original source code - the process of compilation is inherently lossy and some detail will inevitably be lost. How much is lost will depend on the source language, target language and choices made by developers.
Let's start with the easy cases - a high-level language compiled to its own bytecode. For example, Python to .pyc, C# to .NET IL (.dll), Java to .class/.dex. In each of these examples, the bytecode contains direct representations of high-level concepts in the language such as classes, methods, virtual function calls, class layouts, etc. Decompilers exist that will restore shockingly accurate source code from the compiled code.
Here's a brief example in Python. Original source:
class MyClass:
def function(self, a, b):
print("Hello, world:", a, b)
MyClass().function("test", 1234.5678)
Compiled with Python 3.6, and decompiled again using uncompyle6
:
# uncompyle6 version 3.3.5
# Python bytecode 3.6 (3379)
# Decompiled from: Python 3.6.4 (v3.6.4:d48ecebad5, Dec 18 2017, 21:07:28)
# [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
# Embedded file name: /private/tmp/test.py
# Compiled at: 2019-12-23 16:34:01
# Size of source mod 2**32: 121 bytes
class MyClass:
def function(self, a, b):
print('Hello, world:', a, b)
MyClass().function('test', 1234.5678)
# okay decompiling __pycache__/test.cpython-36.pyc
Aside from some extra comments and spaces, the output is basically 1:1 with the original. Java and C# are similarly easy to decompile. Many games are written in Java (e.g. Android) and C# (e.g. Unity), and there are a lot of modders/hackers using decompilers to obtain usable source code for games written in these languages.
A developer can choose to defend against a decompiler by using obfuscation, where they deliberately mangle the compiled output in some way (e.g. renaming variables/functions/classes to gibberish names) to make this type of reverse engineering harder.
The harder cases is when you take code and compile it all the way down to machine code (code that runs directly on the CPU). Languages like Rust, Go, C++, Swift all compile straight to machine code by default. CPU instructions don't correspond 1-to-1 to concepts in the high-level language. Now, there are decompilers - the NSA's recently open-sourced Ghidra decompiler is one of the best out there - but they can only give you a very crude approximation of the original source, and most only decompile to C (not all the way to Rust/Go/C++/Swift/etc.). Here's a simple C++ program:
#include <iostream>
class MyClass {
public:
void function(const char *a, const double b) {
std::cout << "Hello, world: " << a << " " << b << std::endl;
}
};
int main() {
MyClass m;
m.function("test", 1234.5678);
}
Here's how Ghidra 9.1 decompiles it:
// MyClass::function(char const*, double)
void __thiscall MyClass::function(MyClass *this,char *param_1,double param_2)
{
char cVar1;
basic_ostream *pbVar2;
size_t sVar3;
long *plVar4;
long *plVar5;
undefined local_20 [8];
pbVar2 = std::__1::__put_character_sequence<char,std--__1--char_traits<char>>
((basic_ostream *)__ZNSt3__14coutE,"Hello, world: ",0xe);
sVar3 = __stubs::_strlen(param_1);
pbVar2 = std::__1::__put_character_sequence<char,std--__1--char_traits<char>>
(pbVar2,param_1,sVar3);
pbVar2 = std::__1::__put_character_sequence<char,std--__1--char_traits<char>>(pbVar2," ",1);
plVar4 = (long *)__stubs::__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEElsEd(param_2,pbVar2);
__stubs::__ZNKSt3__18ios_base6getlocEv(local_20,*(long *)(*plVar4 + -0x18) + (long)plVar4);
plVar5 = (long *)__stubs::__ZNKSt3__16locale9use_facetERNS0_2idE(local_20,__ZNSt3__15ctypeIcE2idE)
;
cVar1 = (**(code **)(*plVar5 + 0x38))(plVar5,10);
__stubs::__ZNSt3__16localeD1Ev(local_20);
__stubs::__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEE3putEc(plVar4,(ulong)(uint)(int)cVar1);
__stubs::__ZNSt3__113basic_ostreamIcNS_11char_traitsIcEEE5flushEv(plVar4);
return;
}
undefined8 entry(void)
{
MyClass local_10 [8];
MyClass::function(local_10,"test",1234.56780000);
return 0;
}
An experienced reverse engineer can make sense of this - but it's a lot less nice.
So there you have it. If you're reverse engineering a program compiled to native CPU code, you can get source but it's going to be pretty rough. If you're reverse engineering a program compiled to some intermediate bytecode, you'll have a better time. In all cases, you can't get exactly the original source code, but you might be able to get pretty close.