Byte code stack versus three address

S

5

9

When designing a byte code interpreter, is there a consensus these days on whether stack or three address format (or something else?) is better? I'm looking at these considerations:

The objective language is a dynamic language fairly similar to Javascript.
Performance is important, but development speed and portability are more so for the moment.
Therefore the implementation will be strictly an interpreter for the time being; a JIT compiler may come later, resources permitting.
The interpreter will be written in C.

Soupspoon answered 16/6, 2011 at 0:39 Comment(0)

R

0

Take a look at the OCaml bytecode interpreter - it's one of the fastest of its kind. It is pretty much a stack machine, translated into a threaded code on loading (using the GNU computed goto extension). You can generate a Forth-like threaded code as well, should be relatively easy to do.

But if you're keeping a future JIT compilation in mind, make sure that your stack machine is not really a full-featured stack machine, but an expression tree serialisation form instead (like .NET CLI) - this way you'd be able to translate your "stack" bytecode into a 3-address form and then into an SSA.

Redbreast answered 16/6, 2011 at 11:8 Comment(0)

D

8

Read The evolution of Lua and The implementation of Lua 5.0 for how Lua changed from a stack-based virtual machine to a register-based virtual machine and why it gained performance doing it.

Diogenes answered 16/6, 2011 at 1:42 Comment(1)

Also, the Lua interpreter is written in C. – Jasun 16/6, 2011 at 16:9

P

6

Experiments done by David Gregg and Roberto Ierusalimschy have shown that a register-based bytecode works better than a stack-based bytecode because fewer bytecode instructions (and therefore less decoding overhead) are required to do the same tasks. So three-address format is a clear winner.

Pouliot answered 14/7, 2011 at 0:43 Comment(0)

I

1

I don't have much (not really any) experience in this area, so you might want to verify some of the following for yourself (or maybe someone else can correct me where necessary?).

The two languages I work with most nowadays are C# and Java, so I am naturally inclined to their methodologies. As most people know, both are compiled to byte code, and both platforms (the CLR and the JVM) utilize JIT (at least in the mainstream implementations). Also, I would guess that the jitters for each platform are written in C/C++, but I really don't know for sure.

All-in-all, these languages and their respective platforms are pretty similar to your situation (aside from the dynamic part, but I'm not sure if this matters). Also, since they are such mainstream languages, I'm sure their implementations can serve as a pretty good guide for your design.

With that out of the way, I know for sure that both the CLR and the JVM are stack-based architectures. Some of the advantages which I remember for stack-based vs register-based are

Smaller generated code
Simpler interpreters
Simpler compilers
etc.

Also, I find stack-based to be a little more intuitive and readable, but that's a subjective thing, and like I said before, I haven't seen too much byte code yet.

Some advantages of the register-based architecture are

Less instructions must be executed
Faster interpreters (follows from #1)
Can more readily be translated to machine code, since most commonplace hardwares are register based
etc.

Of course, there are always ways to offset the disadvantages for each, but I think these describe the obvious things to consider.

Iolenta answered 16/6, 2011 at 5:43 Comment(1)

CLR is much less "stack-based" as it appears. And it was never designed for intepretation. If you take a closer look at its instruction set, you won't find any stack operations besides trivial pop and dup. No swaps and alike, as in JVM. All the existing .NET JITs are translating this "stack" representation into a 3-address form first, followed by an SSA transform. And for JVM this step is significantly more complicated, due to a less predictable stack behaviour - see the HotSpot sources for details. – Redbreast 16/6, 2011 at 11:13

R

0

Take a look at the OCaml bytecode interpreter - it's one of the fastest of its kind. It is pretty much a stack machine, translated into a threaded code on loading (using the GNU computed goto extension). You can generate a Forth-like threaded code as well, should be relatively easy to do.

But if you're keeping a future JIT compilation in mind, make sure that your stack machine is not really a full-featured stack machine, but an expression tree serialisation form instead (like .NET CLI) - this way you'd be able to translate your "stack" bytecode into a 3-address form and then into an SSA.

Redbreast answered 16/6, 2011 at 11:8 Comment(0)

P

-1

If you have JIT in your mind then bytecodes is the only option.

Just in case you can take a look on my TIScript: http://www.codeproject.com/KB/recipes/TIScript.aspx and sources: http://code.google.com/p/tiscript/

Propensity answered 16/6, 2011 at 0:56 Comment(2)

The OP has already decided upon using byte code - he's asking for some implementation details for the interpreter – Iolenta 16/6, 2011 at 1:5

@Ken Wayne VanderLinde: Ah, that... the answer depends on what kind of memory model that language will have and requirement to overall complexity/size of the project. Stack based is simple and compact. TIScript is stack based with single value register (a.k.a. accumulator). I think it is reasonable speed/size compromise. Actually accumulator there is an array of 256 slots that are used to return multiple values from functions. – Propensity 16/6, 2011 at 2:47

Recommended topics

Hot tags