How can an implementation of a language in the same language be faster than the language?

Asked 21/2, 2012 at 14:48 Answered 16/4, 2012 at 14:11

Solved python performance jvm pypy language-implementation

If I make a JVM in Java, for example, is it possible to make the implementation I made actually faster than the original implementation I used to build this implementation, even though my implementation is built on top of the original implementation and may even be dependant on that implementation?

( Confusing... )

Look at PyPy. It's a JIT Compiler for Python made in Python. That's alright, but how can it claim to be faster than the original implementation of Python which it is using and is dependent on?

Marquise answered 21/2, 2012 at 14:48 Comment(2)

I don't know pyton, but to do a JVM in java you'll need to write a lot of native methods in another language (say, C++). – Pestalozzi 21/2, 2012 at 14:52

Sorry. I just did it to consider the latest answer for a possible accept. Looking through the new answer right now, if better than I will accept, otherwise I'll reaccept the previous one. – Marquise 23/2, 2012 at 9:16

You are confused between a language and the execution apparatus for that language.

One of the reasons why PyPy can be faster than CPython is because PyPy is compiled to a completely separate native executable, and does not depend on, nor execute in, CPython.

Nevertheless, it would be possible for an inefficient implementation of one language to be surpassed by an interpreter written in that same language, and hosted in the inefficient interpreter, if the higher-level interpreter made use of more efficient execution strategies.

Hermaherman answered 21/2, 2012 at 14:56 Comment(0)

Absolutely, it is possible. Your JVM implementation could compile Java bytecodes to optimized machine code. If your optimizer was more sophisticated that that in the JVM implementation which you run your Java compiler on, then the end result could be faster.

In that case, you could run your Java compiler on its own source code, and benefit from faster compilation speeds from then on.

You said that PyPy is a JIT compiler for Python (I'm not familiar with it myself). If that's the case, then it converts a Python program to machine code, and then runs the machine code. Another poster said that the PyPy compiler runs as a standalone executable, separate from CPython. But even if it was to run on CPython, once your program is JIT'd to machine code, and the compiled machine code is running, the performance of the compiler no longer matters. The speed of the compiler only has an effect on startup time.

Melisent answered 21/2, 2012 at 14:52 Comment(0)

PyPy isn't Python interpreter implemented in Python, it's Python interpreter and compiler implemented in RPython, which is a restricted statically typed subset of Python:

RPython is a restricted subset of Python that is amenable to static analysis. Although there are additions to the language and some things might surprisingly work, this is a rough list of restrictions that should be considered. Note that there are tons of special cased restrictions that you’ll encounter as you go.

The real speed difference comes from the fact, that unlike CPython which is interpreting whole program as bytecode, PyPy uses just-in-time (JIT) compilation (into machine code) for RPython parts.

Unanswerable answered 16/4, 2012 at 14:11 Comment(0)

I don't think it's possible to implement an interpreter for a language in that language (call this A), then run it on top of another existing interpreter (call this B) for that language and execute a program (call this P), and have P running on (A running on B) be faster than P running on B.

Every operation of A is going to have to be implemented with at least one operation of B. So even if B is atrociously bad and A, is optimally good, the fact that A is being run on B means that B's badness will slow down A.

It could be possible to implement an interpreter + JIT compiler for a language in the language itself, where the JIT compiler produces some other faster code at runtime, and have P running on (A running on B) be faster than P running on B. The part of P's runtime that isn't JIT compiled will be slower (much slower, normally) but if the JIT compiler successfully identifies the "hot" parts of P and executes them more quickly than B would then the whole system might run faster overall.

But that's not really interesting. It's also possible to implement a compiler for a language in that language (C), compile it with an existing compiler (D), and have the new compiler language produce code that is faster than what the original compiler would have produced. I hope that doesn't startle you; it should be clear that the speed of the code emitted by D will only have an effect on the execution time of C, not on the execution time of other programs compiled with C.

Writing compilers in the languages they compile has been done for decades (GCC is written in C, for example), and isn't really relevant to the real question I think you're asking; neither is JIT-compiling a language using itself. In both cases the underlying execution is something other than the language you're considering; usually machine code.

However, the source of your question is a misconception. PyPy's Python interpreter isn't actually implemented in Python. The PyPy project has an interpreter for Python written in RPython. RPython is a subset of Python, chosen so that it can be efficiently compiled down to machine code; as a language RPython is much more like Java with type inference and indented blocks instead of braces. The PyPy project also has a compiler for RPython which is written in Python, and is capable of (mostly) automatically adding a JIT compiler to any interpreter it compiles.

When you're actually using the PyPy interpreter in production, you're using a machine-code interpreter compiled from the RPython sources, just as when you're using the CPython interpreter you use a machine-code interpreter compiled from C source code. If you execute the PyPy interpreter on top of another Python interpreter (which you can do because valid RPython code is also valid Python code; but not the other way around), then it runs hugely slower than the CPython interpreter.

Vitiate answered 23/2, 2012 at 6:38 Comment(8)

If the person who down-voted this answer would like to tell the issues they've found in it, I will endeavour to address them. – Vitiate 24/2, 2012 at 1:56

Your first paragraph is incorrect. In practice, it would be difficult to contrive a counter-example without contrivance, but nevertheless. – Hermaherman 10/3, 2012 at 14:15

@Hermaherman If it's wrong, I'd like to understand why it's wrong and amend the answer. Even a contrived counter-example would be useful. I strongly suspect my first statement is true for all practical cases even if it's not theoretically always true. – Vitiate 10/3, 2012 at 23:16

@Vitiate Marcin is right. Interpreter A can optimize program P in such a way that when this optimized version of P is being interpreted (run) by interpreter B all suboptimal parts of B itself (involved in interpreting/running original program P) are avoided. Super contrived example would be this; interpreter A has a lookup table which maps programs (their finite subset of course) to their outputs. If program P happens to be present in this table then interpreter A gives interpreter B already computed results. This of course can be much faster than interpreting (running) program P itself by B. – Circumscissile 13/10, 2012 at 21:15

@PiotrDobrogost Isn't that a lot like the JIT compiler case? The underlying execution isn't really interpreter B. It also requires special-casing your interpreter to the program you're interpreting. But yes, I'll accept it as a counter-example. I still think that having a JVM implemented on a JVM, or a Python implemented in Python, running generally faster than the base (where the underlying JVM/Python is actually used, not ignored by using compilation) is probably impossible, though I don't have a proof. – Vitiate 15/10, 2012 at 6:33

The underlying execution isn't really interpreter B. In my contrived example there's close to nothing to interpret for B so there's almost no execution, yes. But in general the whole point is that interpreter A can be much smarter than interpreter B leaving very little work for B and this reduction of B's work is the source of speedup. – Circumscissile 15/10, 2012 at 8:32

@PiotrDobrogost But the "smartness" of interpreter A has to be implemented in interpreter B, which will slow it down. Every time a "primitive action" occurs in program P, it generally correspond to a number of "primitive action"s in interpreter A for its implementation. Even if that number is substantially less than it would have been with interpreter B, you still lose because each of those primitive actions then correspond to multiple primitive actions in interpreter B. The only way you possibly win is by using an implementation that avoids the bad parts of B. – Vitiate 15/10, 2012 at 22:8

@PiotrDobrogost JIT compilation (to another language) or your lookup table trick are an example of this. I discounted them because although they contradict my overly strong statement (for all P, A, B ...), they are not examples of "double interpretation" leading to a speed up. The only way you could do that was if, e.g. B's list implementation is so bad that you can implement lists faster in B by using dictionaries. But things like that still require at least one primitive op of overhead on every operation to decide whether to pass-through to B or to use a "smarter" implementation. – Vitiate 15/10, 2012 at 22:11

-1

The pypy translation process runs on CPython, but the output is a list of .c files (19 files last time I checked) which are then compiled to a binary : pypy-c. At runtime pypy-c does not have any relation with CPython, that's why it can be faster.

Pilgarlic answered 25/2, 2012 at 18:38 Comment(0)

Recommended topics

Hot tags