In the second case +
is faster because in that case V8 actually moves it out of the benchmarking loop - making benchmarking loop empty.
This happens due to certain peculiarities of the current optimization pipeline. But before we get to the gory details I would like to remind how Benchmark.js works.
To measure the test case you wrote it takes Benchmark.prototype.setup
that you also provided and the test case itself and dynamically generates a function that looks approximately like this (I am skipping some irrelevant details):
function (n) {
var start = Date.now();
/* Benchmark.prototype.setup body here */
while (n--) {
/* test body here */
}
return Date.now() - start;
}
Once the function is created Benchmark.js calls it to measure your op for a certain number of iterations n
. This process is repeated several times: generate a new function, call it to collect a measurement sample. Number of iterations is adjusted between samples to ensure that function runs long enough to give meaningful measurement.
Important things to notice here is that
- both your case and
Benchmark.prototype.setup
are the textually inlined;
- there is a loop around the operation you want to measure;
Essentially we discussing why the code below with a local variable x
function f(n) {
var start = Date.now();
var x = "5555"
while (n--) {
var y = +x
}
return Date.now() - start;
}
runs slower than the code with global variable x
function g(n) {
var start = Date.now();
x = "5555"
while (n--) {
var y = +x
}
return Date.now() - start;
}
(Note: this case is called local variable in the question itself, but this is not the case, x
is global)
What happens when you execute these functions with a large enough values of n
, for example f(1e6)
?
Current optimization pipeline implements OSR in a peculiar fashion. Instead of generating an OSR specific version of the optimized code and discarding it later, it generates a version that can be used for both OSR and normal entry and can even be reused if we need to perform OSR at the same loop. This is done by injecting a special OSR entry block into the right spot in the control flow graph.
OSR entry block is injected while SSA IR for the function is built and it eagerly copies all local variables out of the incoming OSR state. As a result V8 fails to see that local x
is actually a constant and even looses any information about its type. For subsequent optimization passes x2
looks like it can be anything.
As x2
can be anything expression +x2
can also have arbitrary side-effects (e.g. it can be an object with valueOf
attached to it). This prevents loop-invariant code motion pass from moving +x2
out of the loop.
Why is g
faster than? V8 pulls a trick here. It tracks global variables that contain constants: e.g. in this benchmark global x
always contains "5555"
so V8 just replaces x
access with its value and marks this optimized code as dependent on the value of x
. If somebody replaces x
value with something different than all dependent code will be deoptimized. Global variables are also not part of the OSR state and do not participate in SSA renaming so V8 is not confused by "spurious" φ-functions merging OSR and normal entry states. That's why when V8 optimizes g
it ends up generating the following IR in the loop body (red stripe on the left shows the loop):
Note: +x
is compiled to x * 1
, but this is just an implementation detail.
Later LICM would just take this operation and move it out of the loop leaving nothing of interest in the loop itself. This becomes possible because now V8 knows that both operands of the *
are primitives - so there can be no side-effects.
And that's why g
is faster, because empty loop is quite obviously faster than a non-empty one.
This also means that the second version of benchmark does not actually measure what you would like it to measure, and while the first version did actually grasp some of the differences between parseInt(x)
and +x
performance that was more by luck: you hit a limitation in V8's current optimization pipeline (Crankshaft) that prevented it from eating the whole microbenchmark away.
parseInt
vs+
. I think your title is a little misleading. – Damson