Why is in_array strict mode on integers slower than non-strict mode?
Asked Answered
M

2

11

I always thought that in_array strict mode will be faster or at least the same speed as non-strict mode. But after some benchmarks I noticed there is a huge difference in execution time between them while searching for integers. The string and array tests indicate that strict mode is faster. Why?

Test code - (PHP 7.2.1):

<?php

$array = array_fill(0, 10000, 12345);

for ($i=0; $i<100000; $i++) {
    in_array($i, $array, true);
}

time php test.php

php -c test.php 12.98s user 0.04s system 98% cpu 13.234 total


<?php

$array = array_fill(0, 10000, 12345);

for ($i=0; $i<100000; $i++) {
    in_array($i, $array, false);
}

time php test.php

php -c test.php 6.44s user 0.04s system 99% cpu 6.522 total

Mothering answered 3/7, 2019 at 12:49 Comment(10)
According to the docs: If the third parameter strict is set to TRUE then the in_array() function will also **check the types** of the needle in the haystack. - So there will be adittional operations (type comparison)Drape
@B001ᛦ Not sure about the actual implementation in C under the hood but one would imagine that strict checking simply enables === instead of == and === is objectively faster in PHP code.Exclude
Yes, but I always thought that for example if it's set to false and for example we have bool (false) argument it should be checked against int (0), float(0) string ('', '0'), bool(false), array([]), etc. So the strict version should have less work to do.Mothering
In your example, all of them match type. What if none of the types match? Try searching for a string instead of an integer? Perhaps in_array checks type first, then if it matches, checks for equality. So if the types match, it takes slightly longer, and if they don't it goes slightly faster.Anon
@Exclude PHP 7.2.1 NTS x64 (MacBook Pro 2015 15")Mothering
For anyone good with C programming, see github.com/php/php-src/blob/master/ext/standard/array.c#L1616 and github.com/php/php-src/blob/master/ext/standard/array.c#L1529Exclude
@Anon I have tested many cases (pasted only two in question)... in all my tests strict mode is more than twice slower than non strict modeMothering
Albeit interesting and potentially worthy as a bug report, I would argue that if you need to perform in_array() such a large number of times then it would be immensely faster to just do an array_flip() and isset() assuming all of your array elements are scalar.Exclude
I have reviewed one more time my tests and noticed that I had some logic errors there in conditions making that there was not equal checks between strict and non strict mode tests. After another check the result is that the slowdown is only while searching for integers. I have updated the question.Mothering
Fixed in github.com/php/php-src/commit/….Commission
C
7

I can offer some small insight from tracing through the C source for in_array.

It turns out, when comparing integers, the path to reach the actual equality check for non-strict mode involves fewer operations than strict mode.

Strict mode

In the case where the strict flag to in_array is true, the following occurs:

  1. We call fast_is_identical_function for each element in the array

  2. The fast_is_identical_function first tests that the types of each operand are different (Z_TYPE_P(op1) != Z_TYPE_P(op2)) in hopes of being able to return false early; this is comparison #1.

  3. If the types are the same (they are, in your test case), we then test (Z_TYPE_P(op1) <= IS_TRUE; I've no idea what this does, but it's comparison #2.

  4. After both comparisons have evaluated to false, we jump into zend_is_identical, our first function invocation.

  5. zend_is_identical starts out by again testing Z_TYPE_P(op1) != Z_TYPE_P(op2), another attempt to fail early. This is comparison #3.

  6. If the types are the same, we can descend through the switch (Z_TYPE_P(op1)) statement, comparison #4

  7. Finally we reach the comparison Z_LVAL_P(op1) == Z_LVAL_P(op2) which actually tests the equality of the two values, comparison #5.

In total, to test whether each element of the array is equal to the value we're searching for, there are 5 comparisons and 1 function invocation.

Non-strict mode

By comparison, the non-strict flow for integers specifically (really LONGs) is much simpler, as follows:

  1. Instead of fast_is_identical_function, we instead use fast_equal_check_function for each element in the array.

  2. The method fast_equal_check_function starts a much more complicated process of comparing the two values with all kinds of type-casting logic. However, the very first test it does happens to be optimized for integers, as follows:

    if (EXPECTED(Z_TYPE_P(op1) == IS_LONG)) {
        if (EXPECTED(Z_TYPE_P(op2) == IS_LONG)) {
            return Z_LVAL_P(op1) == Z_LVAL_P(op2);
    

    We can see that it...

    1. immediately tests whether type of op1 is LONG, which it is, then
    2. immediately tests whether the type of op2 is LONG, which it is, then
    3. immediately returns the result of Z_LVAL_P(op1) == Z_LVAL_P(op2)

A total of 3 simple equality comparisons and 0 function invocations for the non-strict case, vs at least 5 comparisons and 1 jump for the strict case.

This appears to be a case where attempted early optimization makes the strict check slower (by repeatedly testing the types of the operands in hopes that we can find inequality sooner) than the specific non-strict case of comparing two integers.

Convert answered 3/7, 2019 at 13:58 Comment(4)
So this might be the place for the optimization in next PHP releases, right?Mothering
@tom I don't think any of this is really a problem. The code is already optimized pretty heavily, and you would risk introducing some other kind of performance problem for some other type of input by changing it. I think it's better to think of the current strict performance as normal, and the non-strict long-long case as a nice bonus.Convert
yup. You're right. I thought about focusing on the strict mode at the expense of the non-strict mode (if it is needed to improve speed of the strict mode)Mothering
@tom this has been "fixed" in github.com/php/php-src/commit/… :)Jordan
E
1

It seems to have something to do with the type of element in the needle and/or haystack, observe:

PHP 7.3.5 from http://sandbox.onlinephpfunctions.com/

$iterations = 10000000;
$needle = false;
$haystack = [ true ];

$start = microtime( true );
for( $i = 0; $i < $iterations; ++$i )
{
    in_array( $needle, $haystack, true );
}
echo ( microtime( true ) - $start ).' strict'.PHP_EOL;

$start = microtime( true );
for( $i = 0; $i < $iterations; ++$i )
{
    in_array( $needle, $haystack, false );
}
echo ( microtime( true ) - $start ).' not strict';

produces:

0.29996585845947 strict
0.40397191047668 not strict

but if we use:

$needle = 1;
$haystack = [ 2 ];

then we get:

0.34480714797974 strict
0.28275084495544 not strict

However, PHP 5.6.29 produces a negligible discrepancy and running the same test multiple times can put strict ahead of non-strict and vice versa.

Exclude answered 3/7, 2019 at 13:42 Comment(1)
I have noticed the same results after reviewing my tests. The "int" is the only affected one I found.Mothering

© 2022 - 2024 — McMap. All rights reserved.