Ok, I'm posting this fearing that it might be closed before anyone ever reads it - I'm quite used to that - but I'll give it a try... even pointing me to the right direction or some existing answer that does contain a specific answer would definitely do...
So, after this brief intro... I'm currently writting a bytecode interpreter, in C, (stack-based VM) for a programming language I have designed.
If you want to have a look at the supported opcodes, feel free to check them out here: https://github.com/arturo-lang/arturo/blob/master/src/vm/opcodes.h
There is nothing really special about the stack machine. Values are being pushed and popped, and operators and functions work on them, pushing the evaluation result back to the stack. So far so good.
Now, I'm at the point where all the core functionality is in and I'm trying to give it an extra boost by doing further optimizations.
Here's an example (and hopefully a rather illustrative one).
Input:
fibo: $(x){
if x<2 {
return 1
} {
return [fibo x-1] + [fibo x-2]
}
}
i: 0
loop i<34 {
print "fibo(" + i + ") = " + [fibo i]
i: i+1
}
Bytecode produced:
|== Data Segment /======================>
0 : [Func ]= function <5,1>
1 : [Int ]= 34
2 : [String]= fibo(
3 : [String]= ) =
==/ Data Segment =======================|
|== Bytecode Listing /======================>
0 :0 JUMP [Dword] 31
1 :5 LLOAD0
2 :6 IPUSH2
3 :7 CMPLT
4 :8 JMPIFNOT [Dword] 20
5 :13 IPUSH1
6 :14 RET
7 :15 JUMP [Dword] 30
8 :20 LLOAD0
9 :21 IPUSH1
10 :22 SUB
11 :23 GCALL0
12 :24 LLOAD0
13 :25 IPUSH2
14 :26 SUB
15 :27 GCALL0
16 :28 ADD
17 :29 RET
18 :30 RET
19 :31 CPUSH0
20 :32 GSTORE0
21 :33 IPUSH0
22 :34 GSTORE1
23 :35 GLOAD1
24 :36 CPUSH1
25 :37 CMPLT
26 :38 JMPIFNOT [Dword] 61
27 :43 CPUSH2
28 :44 GLOAD1
29 :45 ADD
30 :46 CPUSH3
31 :47 ADD
32 :48 GLOAD1
33 :49 GCALL0
34 :50 ADD
35 :51 DO_PRINT
36 :52 GLOAD1
37 :53 IPUSH1
38 :54 ADD
39 :55 GSTORE1
40 :56 JUMP [Dword] 35
41 :61 END
==/ Bytecode Listing =======================|
For anyone who has worked with compilers, bytecode interpreters or even JVM, the code above should be familiar.
What I want?
Ideas - general or specific ones - about how to further optimize my bytecode.
For examples, every *2
(that is: IPUSH2
followed by a MUL
instruction) is converted to: IPUSH1, SHL
since it's a faster operation.
What else would you suggest? Is there anywhere a list of such things to optimize? Could you suggest something concrete?
Thanks in advance! :)
*2
by<<1
is a peephole optimization entirely independent to your bytecode. If that’s what you want, you’ll find tons of examples for possible optimizations in the www. – Kelmclang
to compile the IR and take advantage of all of its optimization and passes. – Ligamentous