Disclaimer: I'm not a PHP Internals expert (yet?) so this is all from my understanding, and not guaranteed to be 100% correct or complete. :)
So, firstly, the PHP 7 behaviour - which, I note, is also followed by HHVM - appears to be correct, and PHP 5 has a bug here. There should be no extra assign by reference behaviour here, because regardless of execution order, the result of the two calls to ++$i
should never be the same.
The opcodes look fine; crucially, we have two temp variables $2
and $3
, to hold the two increment results. But somehow, PHP 5 is acting as though we'd written this:
$i = 2;
$i++; $temp1 =& $i;
$i++; $temp2 =& $i;
echo $temp1 + $temp2;
Rather than this:
$i = 2;
$i++; $temp1 = $i;
$i++; $temp2 = $i;
echo $temp1 + $temp2;
Edit: It was pointed out on the PHP Internals mailing list that using multiple operations that modify a variable within a single statement is generally considered "undefined behaviour", and ++
is used as an example of this in C/C++.
As such, it's reasonable for PHP 5 to return the value it does for implementation / optimisation reasons, even if it is logically inconsistent with a sane serialization into multiple statements.
The (relatively new) PHP language specification contains similar language and examples:
Unless stated explicitly in this specification, the order in which the operands in an expression are evaluated relative to each other is unspecified. [...] (For example,[...] in the full expression $j = $i + $i++
, whether the value of $i
is the old or new $i
, is unspecified.)
Arguably, this is a weaker claim than "undefined behaviour", since it implies they are evaluated in some particular order, but we're into nit-picking now.
phpdbg investigation (PHP 5)
I was curious, and want to learn more about the internals, so did some playing around using phpdbg.
No references
Running the code with $j = $i
in place of $j =& $i
, we start with 2 variables sharing an address, with a refcount of 2 (but no is_ref flag):
Address Refs Type Variable
0x7f3272a83be8 2 (integer) $i
0x7f3272a83be8 2 (integer) $j
But as soon as you pre-increment, the zvals are separated, and only one temp var is sharing with $i, giving a refcount of 2:
Address Refs Type Variable
0x7f189f9ecfc8 2 (integer) $i
0x7f189f859be8 1 (integer) $j
With reference assignment
When the variables have been bound together, they share an address, with a refcount of 2, and a by-ref marker:
Address Refs Type Variable
0x7f9e04ee7fd0 2 (integer) &$i
0x7f9e04ee7fd0 2 (integer) &$j
After the pre-increments (but before the addition), the same address has a refcount of 4, showing the 2 temp vars erroneously bound by reference:
Address Refs Type Variable
0x7f9e04ee7fd0 4 (integer) &$i
0x7f9e04ee7fd0 4 (integer) &$j
The source of the issue
Digging into the source on http://lxr.php.net, we can find the implementation of the ZEND_PRE_INC
opcode:
PHP 5
The crucial line is this:
SEPARATE_ZVAL_IF_NOT_REF(var_ptr);
So we create a new zval for the result value only if it is not currently a reference. Further down, we have this:
if (RETURN_VALUE_USED(opline)) {
PZVAL_LOCK(*var_ptr);
EX_T(opline->result.var).var.ptr = *var_ptr;
}
So if the return value of the decrement is actually used, we need to "lock" the zval, which following a whole series of macros basically means "increment its refcount", before assigning it as the result.
If we created a new zval earlier, that's fine - our refcount is now 2, 1 for the actual variable, plus 1 for the operation result. But if we decided not to, because we needed to hold a reference, we're just incrementing the existing reference count, and pointing at a zval which may be about to be changed again.
PHP 7
So what's different in PHP 7? Several things!
Firstly, the phpdbg output is rather boring, because integers are no longer reference counted in PHP 7; instead, a reference assignment creates an extra pointer, which itself has a refcount of 1, to the same address in memory, which is the actual integer. The phpdbg output looks like this:
Address Refs Type Variable
0x7f175ca660e8 1 integer &$i
int (2)
0x7f175ca660e8 1 integer &$j
int (2)
Secondly, there is a special code path in the source for integers:
if (EXPECTED(Z_TYPE_P(var_ptr) == IS_LONG)) {
fast_long_increment_function(var_ptr);
if (UNEXPECTED(RETURN_VALUE_USED(opline))) {
ZVAL_COPY_VALUE(EX_VAR(opline->result.var), var_ptr);
}
ZEND_VM_NEXT_OPCODE();
}
So if the variable is an integer (IS_LONG
) and not a reference to an integer (IS_REFERENCE
) then we can just increment it in place. If we then need the return value, we can copy its value into the result (ZVAL_COPY_VALUE
).
If it's a reference, we won't hit that code, but rather than keeping references bound together, we have these two lines:
ZVAL_DEREF(var_ptr);
SEPARATE_ZVAL_NOREF(var_ptr);
The first line says "if it's a reference, follow it to its target"; this takes us from our "reference to an integer" to the integer itself. The second - I think - says "if it's something refcounted, and has more than one reference, create a copy of it"; in our case, this will do nothing, because the integer doesn't care about refcounts.
So now we have an integer we can decrement, that will affect all by-reference associations, but not by-value ones for refcounted types. Finally, if we want the return value of the increment, we again copy it, rather than just assigning it; and this time with a slightly different macro which will increase the refcount of our new zval if necessary:
ZVAL_COPY(EX_VAR(opline->result.var), var_ptr);
++$i
will first increment$i
and then return$i
, as per documentation. Whether the conversion from the variable$i
to an integer value (necessary for the plus operation) happens before or after the second increment operation is not defined though. – DieciousADD
andECHO
. – Freeload