Why is array_key_exists 1000x slower than isset on referenced arrays?
Asked Answered
B

4

24

I have found that array_key_exists is over 1000x slower than isset at check if a key is set in an array reference. Does anyone that has an understanding of how PHP is implemented explain why this is true?

EDIT: I've added another case that seems to point to it being overhead required in calling functions with a reference.

Benchmark Example

function isset_( $key, array $array )
{
    return isset( $array[$key] );
}

$my_array = array();
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    array_key_exists( $i, $my_array );
    $my_array[$i] = 0;
}
$stop = microtime( TRUE );
print "array_key_exists( \$my_array ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    isset( $my_array[$i] );
    $my_array[$i] = 0;
}
$stop = microtime( TRUE );
print "isset( \$my_array ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    isset_( $i, $my_array );
    $my_array[$i] = 0;
}
$stop = microtime( TRUE );
print "isset_( \$my_array ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    array_key_exists( $i, $my_array_ref );
    $my_array_ref[$i] = 0;
}
$stop = microtime( TRUE );
print "array_key_exists( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    isset( $my_array_ref[$i] );
    $my_array_ref[$i] = 0;
}
$stop = microtime( TRUE );
print "isset( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    isset_( $i, $my_array_ref );
    $my_array_ref[$i] = 0;
}
$stop = microtime( TRUE );
print "isset_( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );

Output

array_key_exists( $my_array ) 0.0056459903717
isset( $my_array ) 0.00234198570251
isset_( $my_array ) 0.00539588928223
array_key_exists( $my_array_ref ) 3.64232587814 // <~ what on earth?
isset( $my_array_ref ) 0.00222992897034
isset_( $my_array_ref ) 4.12856411934 // <~ what on earth?

I'm on PHP 5.3.6.

Codepad example.

Buatti answered 14/6, 2011 at 0:29 Comment(5)
Search the internet for isset vs. array_key_exists and you will find more resources than you're probably able to read.Cointreau
@Cointreau I understand the differences between isset and array_key_exists (how they work), my question is why this performance hit is happening when they are used on references.Buatti
array_key_exists must check for a concrete key value, isset must not. If you must look for the concrete value of a reference, the reference must be resolved additionally. So there is more work to do which takes more time. Why it's so much more time I can not tell you, but you're as well unsetting references, and this will influence the amount of work.Cointreau
@Cointreau I've updated my example with testing function( $key, $array ) { return isset( $array[$key] ); }. It is also slow, this seems to point to overhead required in calling a function w/ a reference.Buatti
I don't think your test is really showing you what you think. Userspace functions (like the one you created) is always going to be significantly slower than in-built functions.Sw
S
8

At work I've got a VM instance of PHP that includes a PECL extension called VLD. This lets you execute PHP code from the commandline and rather than execute it, it returns the generated opcode instead.

It's brilliant at answering questions like this.

http://pecl.php.net/package/vld

Just in case you go this route (and if you're generally curious about how PHP works internally, i think you should) you should definitely install it on a virtual machine (that is, i wouldn't install it on a machine i'm trying to develop on or deploy to). And this is the command you'll use to make it sing:

php -d vld.execute=0 -d vld.active=1 -f foo.php

Looking at the opcodes will tell you a more complete story, however, I have a guess.... Most of PHP's built-ins make a copy of an Array/Object and act upon that copy (and not a copy-on-write either, an immediate copy). The most widely known example of this is foreach(). When you pass an array into foreach(), PHP is actually making a copy of that array and iterating on the copy. Whis is why you'll see a significant performance benefit by passing an array as a reference into foreach like this:

foreach($someReallyBigArray as $k => &$v)

But this behavior -- that passing in an explicit reference like that -- is unique to foreach(). So I would be very surprised if it made an array_key_exists() check any faster.

Ok, back to what I was getting at..

Most the built-ins take a copy of an array and act upon that copy. I am going to venture a completely unqualified guess that isset() is highly optimized and that one of those optimizations is perhaps to not do an immediate copy of an Array when its passed-in.

I'll try to answer any other questions you may have but you could probably read a lot of you google for "zval_struct" (which is the data structure in the PHP internals which stores each variable. It's a C struct (think.. an associative array) that has keys like "value", "type", "refcount".

Sw answered 18/6, 2011 at 12:43 Comment(3)
I don't think this really answers the question. While that tool looks very interesting and I'll have to give it a try, you answer is a guess at best. BTW isset isn't a function, it's a language construct (why it probably doesn't have the overhead showing up here, until you wrap it in a function).Buatti
Kendall, this is only a "guess" because I didn't grep the php-src tree to answer your question. It's a very informed guess. I've lately worked on more PHP Extenstion development (in C) than PHP code itself. And your comment about "language construct"... You just got that from the PHP manual? But you seem to not understand what that really even means. All it means is that, in the C code, it's not implemented as a global ZEND_FUNCTION. The "overhead" for calling isset() is no different than the overhead of calling, say, "is_a" which IS a ZEND_FUNCTION.Sw
Anyhow, believe what you'd like. You seem to have a hypothesis that you're attached to regardless of facts. And more than once you seem to think that a userspace function (that is, a function you create in PHP code) is somehow comparable to a language built-in. The only thing they have in common is that they're both named "function."Sw
S
3

Here is the source of the array_key_exists function for 5.2.17. You can see that even if the key is null, PHP attempts to compute a hash. Although it's interesting that if you remove

// $my_array_ref[$i] = NULL;

then it performs better. There must be multiple hash lookups occuring.

/* {{{ proto bool array_key_exists(mixed key, array search)
   Checks if the given key or index exists in the array */
PHP_FUNCTION(array_key_exists)
{
    zval **key,                 /* key to check for */
         **array;               /* array to check in */

    if (ZEND_NUM_ARGS() != 2 ||
        zend_get_parameters_ex(ZEND_NUM_ARGS(), &key, &array) == FAILURE) {
        WRONG_PARAM_COUNT;
    }

    if (Z_TYPE_PP(array) != IS_ARRAY && Z_TYPE_PP(array) != IS_OBJECT) {
        php_error_docref(NULL TSRMLS_CC, E_WARNING, "The second argument should be either an array or an object");
        RETURN_FALSE;
    }

    switch (Z_TYPE_PP(key)) {
        case IS_STRING:
            if (zend_symtable_exists(HASH_OF(*array), Z_STRVAL_PP(key), Z_STRLEN_PP(key)+1)) {
                RETURN_TRUE;
            }
            RETURN_FALSE;
        case IS_LONG:
            if (zend_hash_index_exists(HASH_OF(*array), Z_LVAL_PP(key))) {
                RETURN_TRUE;
            }
            RETURN_FALSE;
        case IS_NULL:
            if (zend_hash_exists(HASH_OF(*array), "", 1)) {
                RETURN_TRUE;
            }
            RETURN_FALSE;

        default:
            php_error_docref(NULL TSRMLS_CC, E_WARNING, "The first argument should be either a string or an integer");
            RETURN_FALSE;
    }

}
Sirocco answered 1/7, 2011 at 1:32 Comment(3)
I don't think a few more lookups could cause a 1000x slow down. I'm pretty sure the problem is a bit more than just the array_key_exists function anyway. As you can see in the example, it happens for any function (see isset_).Buatti
I think it's internal to zend_hash_index_exists() et al; accessing an unset element in general is slower than a set element. I've optimized production code in the past by testing first for truthy (!!$hash->$value) and if still not sure only then with array_key_exists(), it ran much faster.Elaboration
the issue is the separation (aka duplication of the array…)Atmolysis
H
3

Since PHP >= 7.4 array_key_exists is now as fast as isset.

See https://www.php.net/manual/en/function.array-key-exists.php#125313

And some perfomance tests:

<?php declare(strict_types = 1);

function testPerformance($name, Closure $closure, $runs = 1000000)
{
    $start = microtime(true);
    for (; $runs > 0; $runs--)
    {
        $closure();
    }
    $end = microtime(true);

    printf("Function call %s took %.5f seconds\n", $name, $end - $start);
}

$items = [1111111];
for ($i = 0; $i < 100000; $i++) {
    $items[] = rand(0, 1000000);
}
$items = array_unique($items);
shuffle($items);

$assocItems = array_combine($items, array_fill(0, count($items), true));

$isset = function () use ($assocItems) {
    isset($items[1111111]);
};

$array_key_exists = function () use ($assocItems) {
    array_key_exists(1111111, $assocItems);
};

testPerformance('isset', $isset, 100000);
testPerformance('array_key_exists', $array_key_exists, 100000);

Output:

Function call isset took 0.00561 seconds
Function call array_key_exists took 0.00547 seconds
Hilar answered 23/2, 2022 at 18:24 Comment(1)
Reference: bugs.php.net/bug.php?id=71239Ahlgren
C
1

Not array_key_exists, but the removal of the reference (= NULL) causes this. I commented it out from your script and this is the result:

array_key_exists( $my_array ) 0.0059430599212646
isset( $my_array ) 0.0027170181274414
array_key_exists( $my_array_ref ) 0.0038740634918213
isset( $my_array_ref ) 0.0025200843811035

Only removed the unsetting from the array_key_exists( $my_array_ref ) part, this is the modified part for reference:

$my_array = array();
$my_array_ref = &$my_array;
$start = microtime( TRUE );
for( $i = 1; $i < 10000; $i++ ) {
    array_key_exists( $i, $my_array_ref );
    // $my_array_ref[$i] = NULL;
}
$stop = microtime( TRUE );
print "array_key_exists( \$my_array_ref ) ".($stop-$start).PHP_EOL;
unset( $my_array, $my_array_ref, $start, $stop, $i );
Cointreau answered 14/6, 2011 at 0:50 Comment(16)
Then it's useless. The = NULL is just in place a larger value. I'll replace it w/ 0 in my example to make that more clear.Buatti
By removing that line, that function can probably return early (since there is no keys in the array). Besides it doesn't reflect the real-life problem I ran into (an ref array with keys in it).Buatti
Hmm, probably each time a reference changes, the hash is triggered to re-index the keys.Cointreau
Hmm, why do you need a reference at all?Cointreau
I think your right about the = <something> is doing something to kill the performance of the reference. I don't understand how it can require 1000x the work to do it though.Buatti
I found this bug while passing an array into a closure by reference because I wanted to change it in the closure. I simplified the case down so that closures wouldn't confuse people.Buatti
Look inside the PHP source code what is triggered by this. Then you probably find out. I do not know that, but I think this is related to memory management and copy on write plus the hash array structure.Cointreau
I know there is ways to get around this, but I'm more interested in the WHY that this is happening, not how to fix it (though that would be nice also).Buatti
Hence why I was hoping with someone w/ knowledge of PHP internals would shed some light on this problem.Buatti
I think you can fix this by not using a reference. There is not much you can do else in an assignment on the PHP level I think, so no clue next to not using a reference for the how to fix. The why lies within the source, probably konforce knows that.Cointreau
PHP references have been the source of much pain. I've had problems with segfaults, memory corruption and now this...Buatti
Why are you making use of those? You have not written much to that, normally those are not needed at all.Cointreau
Sometimes they are nice to use when you need to temporally treat an array like an object or if you need to make an array reference itself. But I'll agree, most of the time there is a better way.Buatti
The part about temporarily treat an array like an object sounds interesting. How do you do that with references?Cointreau
I just meant that arrays default to pass by value and objects pass by references and sometimes it's preferable to switch their behavior.Buatti
You should checkout ArrayObject and probably make use of the ArrayAccess interface. I think that's worth to try out. An object that behaves similar to an array.Cointreau

© 2022 - 2024 — McMap. All rights reserved.