What is the difference between compilation and interpretation?
Asked Answered
S

2

19

I just had a conversation with a colleague and where were talking about the V8 JavaScript engine. According to Wikipedia,

V8 compiles JavaScript to native machine code [...] before executing it, instead of more traditional techniques such as interpreting bytecode or compiling the whole program to machine code and executing it from a filesystem.

where (correct me if I'm wrong) "interpreting bytecode" is the way Java works, and "compiling the whole program" would apply for languages like C or C++. Now we were wondering, debating and posing false assertions and presumptions about differences, similarities. To end this, I recommended asking the experts on SO.

So, who is able to

  1. name, explain and/or reference all major methods (e.g. precompiling vs. runtime interpretation)
  2. to visualize or to provide a scheme about the relations between source, compilation and interpretation
  3. give examples (name programming languages) for the major methods of #1.

Notes:

  • I am not looking for a long prosaic essay about the different paradigms, but an visually supported, quick overview.
  • I know that Stackoverflow is not intended to be a encyclopedia for programmers (but rather a Q&A platform for more specific questions). But since I can find a lot of popular questions, that kind of provide an encyclopedic view to certain topics (e.g. [1], [2], [3], [4], [5]), I started this question.
  • If this question would rather fit into any other StackExchange site (e.g. cstheory), please let me know or flag this question for moderation.
Swinger answered 19/6, 2014 at 12:40 Comment(1)
Surprised this doesn't have more votes as it's an important question with some fantastic answers.Floranceflore
E
18

It's near-impossible to answer your question for one simple reason: There aren't a few approaches, they are rather a continuum. The actual code involved across this continuum is also fairly identical, the only difference being when things happen, and whether intermediate steps are saved in some way or not. Various points across this continuum (which is not a single line, a progression, but rather more of a rectangle with different corners to which you can be close) are:

  1. Reading source code
  2. Understanding the code
  3. Executing what you understood
  4. Caching various intermediate data along the road, or even persistently saving them to disk.

For example, a purely interpreted programming language Pretty much doesn't do #4 and #2 kinda happens implicitly between 1 and 3 so you'd barely notice it. It just reads sections of the code, and immediately reacts to them. This means there is low overhead to actually starting execution, but e.g. in a loop the same lines of text get read and re-read again.

Diagram of the balance of an Interpreter (not much caching going on)

In another corner of the rectangle, there are traditionally compiled languages, where usually, item #4 consists of permanently saving actual machine code to a file, which can then be run at a later time. This means you wait a comparatively long while at the beginning until the entire program is translated (even if you're only calling a single function in it), but OTOH loops are faster because the source doesn't need to be read again.

Diagram of the balance of a Compiler (mostly caching)

And then there are things in between, e.g. a virtual machine: For portability, many programming languages don't compile to actual machine code, but to a byte code. There is then a compiler that generates the byte code, and an interpreter that takes this bytecode and actually runs it (effectively "turning it into machine code"). While this is generally slower than compiling and going directly to machine code, it is easier to port such a language to another platform, as you only have to port the bytecode interpreter, which is often written in a high-level language, meaning you can use an existing compiler to do this "effective translation to machine code", and don't have to make and maintain a backend for each platform you want to run on. Also, this can be faster if you can perform the compilation to bytecode once, and then only distribute the compiled bytecode, so that other people do not have to spend CPU cycles on e.g. running the optimizer over your code, and only pay for the bytecode-to-native translation, which may be negligible in your use case. Also, you're not handing out your source code.

Another thing in between would be a Just-in-Time compiler (JIT), which is effectively a interpreter that keeps around code it has run once, in compiled form. This 'keeping around' makes it slower than a pure interpreter (e.g. added overhead and RAM use leading to swapping and disk access), but makes it faster when repeatedly executing a stretch of code. It can also be faster than a pure compiler for code where e.g. only a single function is repeatedly called, because it doesn't waste time compiling the rest of the program if it isn't used.

And finally, you can find other spots on this rectangle e.g. by not saving compiled code permanently, but purging compiled code from the cache again. This way you can e.g. save disk space or RAM on embedded systems, at the cost of maybe having to compile a seldom-used piece of code a second time. Many JIT compilers do this.

Euripides answered 19/6, 2014 at 13:52 Comment(8)
“This 'keeping around' makes it slower than a pure interpreter” – on the contrary: JITs are usually (and were originally designed to be) substantially faster than pure interpreters.Delayedaction
@konrad-rudolph See the explanation below that. For translating an entire program and running every expression once, it is slower (in the absolute), but since that pretty much never happens in practice, as at least some part is usually executed twice or not at all, JIT is faster. But for an example of a JIT being slower than a pure interpreter, look no farther than the "Hello World" example.Euripides
I’m not convinced. By the same token compiled code is slower than interpreted code but that statement is obviously misleading and simply not true except for trivial cases (your “Hello World” example). In neither case is “slower than a pure interpreter” an informative statement.Delayedaction
@konrad-rudolph I'm explaining the theory behind the different approaches and the trade-offs here, which is needed to choose the weighting of any hybrid approach (which e.g. JITs are, they are hybrid compiler/interpreter/cache constructs). It is exactly the non-practical, theoretical case you need to know here. Only then do you look at your concrete practical use case (which differs for a web site popup, a web input field filter, a one-off batch operation or systems programming) and decide which combination will be best for you.Euripides
@konrad-rudolph I've added a qualifier behind the "slower than a pure interpreter" to give an example of which part is actually slower. Also, look at Safari on Mac for an example that uses almost all approaches. The host itself is compiled, which is fastest execution. JavaScript is usually interpreted for one-offs like onLoad (least overhead), bytecode interpreted for stuff that gets run a few times (more overhead but no repeated parsing), and stuff that's run repeatedly (like an input field filter) is compiled using the new FTL feature.Euripides
Good comments, they should almost be part of the answer though, I feel.Delayedaction
@Euripides JIT Compiler as you have mentioned " makes the code run slower" than the pure interpreter is absolutely wrong. Why we have a JIT compiler first hand is because it does code optimizations to run the code faster than the pure interpreter. And it is up to the developer and programmer to adhere to the best coding practices for JIT compiler to optimize our code.Alectryomancy
@ImranRafiqRather "slower" is a fuzzy word. As the comment in brackets mentions, code starts running slower ("overhead"). I am not saying that the code runs more slowly when it gets around to actually running the individual instructions. For more detail, you can read a more exhaustive version of this post here: orangejuiceliberationfront.com/…Euripides
D
3

Many execution environments nowadays use bytecode (or something similar) as an intermediate representation of the code. So the source code is first compiled into an intermediate language, which is then either interpreted by a virtual machine (which decodes the bytecode instruction set) or is compiled further into machine code, and executed by the hardware.

There are very few production languages which are interpreted without being precompiled into some intermediate form. However, it’s easy to conceptualise such an interpreter: just think of a class hierarchy with subclasses for every type of language element (if statement, for, etc.), and each class having an Evaluate method which evaluates a given node. This is also commonly known as the interpreter design pattern.

As an example, consider the following code fragment implementing an if statement in a hypothetical interpreter (implemented in C#):

class IfStatement : AstNode {
    private readonly AstNode condition, truePart, falsePart;

    public IfStatement(AstNode condition, AstNode truePart, AstNode falsePart) {
        this.condition = condition;
        this.truePart = truePart;
        this.falsePart = falsePart;
    }

    public override Value Evaluate(EvaluationContext context) {
        bool yes = condition.Evaluate(context).IsTrue();
        if (yes)
            truePart.Evaluate(context);
        else
            falsePart.Evaluate(context);
        return Value.None; // `if` statements have no value.
    }
}

This is a very simple but fully functional interpreter.

Delayedaction answered 19/6, 2014 at 12:47 Comment(11)
"True interpreters" is an unfortunate choice of words. It's a matter of layers and POV. It can be argued that even a bytecode interpreter is not a "true interpreter" because it needs someone to compile the source code to bytecode before it interprets it. A "true interpreter" reads a bit of source code line-by-line or statement-by-statement and immediately executes each as it is recognized. A bytecode interpreter is an interpreter, too, but only if you ignore the previous compilation step, which turns it into a VM or virtual CPU.Euripides
@Euripides I don’t agree with this assessment. There are simply two different phases here which should not be conflated: the Python interpreter is a true interpreter, even though it interprets bytecode rather than Python code – either statement by statement (that’s known as a REPL) or one file at a time. The original source code is also compiled into an intermediate format but this doesn’t detract from the fact that that intermediate format is then interpreted, and that the implementation of this step is classified as an interpreter.Delayedaction
But where do you stop this distinction? Intel CPUs these days contain an interpreter that translates the Intel instruction set into the actual RISC-like hardware instructions. At that POV, all code is interpreted. We need to distinguish terminology here. Traditionally, "interpreted programming language" has meant the source code is interpreted, not bytecode. There is both the algorithm called "interpreter" (which a bytecode interpreter uses) and the higher-level concept of an "interpreted programming language" which the OP asked about. We shouldn't conflate them.Euripides
Konrad: following that reasoning (classic pre JIT) Java is also an interpreter, since it compiles to bytecode, and the VM is an interpreter. It is dangerous to name the whole process after one phase of it. A good working definition of compilation is translation to a level closer to the machine. That applies to bytecode generation too.Quintic
@Marco But Java’s JIT does not interpret the bytecode. It compiles it to machine code.Delayedaction
@Euripides “Where to stop this distinction?” – I never spoke of stopping anywhere. But you’ve already made the (mostly used, and usually meaningful) distinction yourself: at the software/hardware interface. But on the other hand, as you’ve said: “At that POV, all code is interpreted” – that is correct, and at that POV it’s an entirely meaningful definition to work with.Delayedaction
@KonradRudolph, the dominant JVM implementation, HotSpot, is indeed an interpreter. It only JIT-compiles the congested paths, and the single use paths are interpreted directly.Logroll
SK-logic stole my line, though I didn't know if it still held for current versions. OTOH this is all nitpicking. C# doesn't do interpretation, and is only marginally different. My point was more that putting a hard division line on compilation vs interpretation purely on generating machinecode at some point is quite arbitrary. Systems that are nearly equivalent (like Java and C#) suddenly are separated by a wide rift.Quintic
@Marco I’ve removed the inaccurate portion. That said, I stand by the rest of what I’ve said. I actually almost agreed with your last comment until I thought about it some more: you are trying to make Java and C# seem almost identical – and from the user perspective they may well be, but this isn’t what we are talking about here. We are talking about how they are executed, and as SK-logic has mentioned, they are treated in quite fundamentally different ways, even if some phases overlap. This – no more, no less – is what I’m saying. (continued …)Delayedaction
@Marco In particular, contrary to what your first comment stated, I’ve never claimed, nor implied, that bytecode generation wasn’t compilation, and it was not my intention to “name the whole process after one phase of it” – instead, I responded to one particular point in OP’s question (about which, it turns out, I was wrong, since I thought that Java byte code is always JIT-compiled same as the .NET Intermediate Language).Delayedaction
As said I don't know the exact current situation, since I don't track Java (or C#) anymore. I can vaguely remember they wanted to remove the interpretation with Java 1.5. It is also possible they left it in J2ME, since there it is a great saver of memory. (always JITting has a memory footprint impact IIRC). And I stand by my case that if a definition creates a wide rift between Java and C#, the definition is flawed :_)Quintic

© 2022 - 2024 — McMap. All rights reserved.