How are associative arrays implemented in PHP?
Asked Answered
M

5

26

Can someone explain how PHP implements associative arrays? What underlying data structure does PHP use? Does PHP hash the key and store it in some kind of hash map? I am curious because I was wondering what the performance of associative arrays where when inserting and searching for keys.

Mindimindless answered 29/10, 2008 at 16:26 Comment(1)
I'll leave this link for someone else to grind through, but you can view the actual C source for PHP at http://svn.php.net/viewvc/php/php-src/Lait
U
8

The highest voted answer link is broken and doesn't give that much explanation.

PHP is written in C and the underlying structure is just a C array. C arrays are just chunks of memory. The indexes in C arrays must be continuous, you can't have an index 0 and an index 1000 that comes after it. To make associative array keys work, before they are added to the C array, they are converted to proper C indices via a hash function.

For a full explanation, I found this link to be much more informative.

http://nikic.github.io/2012/03/28/Understanding-PHPs-internal-array-implementation.html

Unproductive answered 21/4, 2014 at 23:26 Comment(6)
What is the size of the underlying C array? Is the size increased and the keys rehashed if the array grows overtime like e.g. in Java's HashMap? Thank you!Elaterite
@Elaterite you can look at the sourcecode yourself - github.com/php/php-src/blob/master/Zend/zend_hash.c. It used to just use HashTable datatype in php5, but everything is zen engine now so they use zend_hash which still does use hashtables. You can read up more on it at: phpinternalsbook.com/php5/hashtables.htmlUnproductive
@Elaterite In short, yes. As with most hash tables, if an inserted element increases the container's load factor beyond the threshold defined by the implementation, the table allocates memory for a larger array and rehashes the keys.Trabeated
@CyRossignol Thank you for your reply! Rehashing all the keys sounds like an expensive O(n) operation.Elaterite
@Elaterite You're right, it's a relatively expensive operation. Most generic hash tables amortize this cost by allocating a bigger array than needed for one insertion so that subsequent insertions don't incur the overhead. From an algorithm analysis point-of-view, the cost of insertions approaches O(1).Trabeated
@CyRossignol Thank for your replies! Using the information I read here on SO and in other places, I came up with my own implementation of a linked hash map in PHP: github.com/tonix-tuft/linked-hash-map. In my case, I never rehash keys, but I use a multidimensional array to identify the bucket for a given key, and each dimension of this array has a length of a different prime number, in order to minimize collisions. This implementation allows using any PHP data type as key (whereas builtin PHP arrays allow only ints and strings). Hope someone can benefit from this data structure.Elaterite
C
7

It's a hash table. The type declaration and hashing function are here:
http://svn.php.net/viewvc/php/php-src/trunk/Zend/zend_hash.h?view=markup

There is a light weight array and a linked list within the spl (standard php lib)

Columbary answered 22/4, 2010 at 18:54 Comment(1)
Source has moved to GitHub: github.com/php/php-src/blob/master/Zend/zend_hash.hDandy
K
6

Well, for what it is worth, all PHP arrays are Associative arrays.

Katabasis answered 29/10, 2008 at 16:28 Comment(0)
M
3

@EBGreen is correct.

Which gives you some interesting performance problems, especially when treating an array as a list and using the [] (array add) operator. PHP doesn't seem to cache the largest numeric key and add one to it, instead it seems to traverse all of the keys to find what the next numeric key should be. I've rewritten scripts in python because of PHP's dismal array-as-a-list performance.

Associative arrays have the standard dict/hash performance overhead.

Mord answered 29/10, 2008 at 17:14 Comment(4)
Are you sure about this? I've just ran benchmarks on a test array of 1000 entries (copying to a new array, one by one), and if you don't specify the key for the new array, it's consistently 7% faster (on PHP 5.2.6)Impediment
It's possible they've changed it recently. I was using 5.1 when I was doing the work. PHP's array was AWFUL when you're talking about 10k entries or more.Mord
AFAIK this is not the case, please compare: A zend hash table has an element nNextFreeElement ...Aylesbury
@RickyMason. You probably wouldn't normally, but for thorough testing, calculating per item times for 10, 100, 1k and 10k would really highlight scalability performance issues, especially if there is a chance that 10k may have to be handled.Remittee
R
2

It's all hash tables, according to sources in various web forums: http://www.usenet-forums.com/php-language/15348-zend-engine-array-implementation.html

If you want to be sure, read the source, then compile it, but make sure you can trust your compiler (Warning: PDF, and unrelated, but very cool).

Raphael answered 29/10, 2008 at 16:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.