PHP array performance
Asked Answered
L

7

39

I'm testing an algorithm for 2d bin packing and I've chosen PHP to mock it up as it's my bread-and-butter language nowadays.

As you can see on http://themworks.com/pack_v0.2/oopack.php?ol=1 it works pretty well, but you need to wait around 10-20 seconds for 100 rectangles to pack. For some hard to handle sets it would hit the php's 30s runtime limit.

I did some profiling and it shows that most of the time my script goes through different parts of a small 2d array with 0's and 1's in it. It either checks if certain cell equals to 0/1 or sets it to 0/1. It can do such operations million times and each times it takes few microseconds.

I guess I could use an array of booleans in a statically typed language and things would be faster. Or even make an array of 1 bit values. I'm thinking of converting the whole thing to some compiled language. Is PHP just not good for it?

If I do need to convert it to let's say C++, how good are the automatic converters? My script is just a lot of for loops with basic arrays and objects manipulations.

Edit. This function gets called more than any other. It reads few properties of a very simple object, and goes through a very small part of a smallish array to check if there's any element not equal to 0.

function fits($bin, $w, $h, $x, $y) {

    $w += $x;
    $h += $y;

    for ($i = $x; $i < $w; $i++) {

        for ($j = $y; $j < $h; $j++) {

            if ($bin[$i][$j] !== 0) {
                return false;
            }
        }
    }

    return true;    
}

Update: I've tried using 1d array instead of 2d as one of the answers suggested. Since I needed to always have current bin width accessible, I decided to wrap everything in the object. Also, now in every loop the index needs to be calculated. Now the script takes even more time to run. Other techniques didn't bring much performance boost, rather made code less readable. Time for HipHop I guess.

Update: since hiphop php only runs on linux, and I don't have one, I've decided to rewrite the whole thing in C++. It's nice to freshen up the old skills. Also, if I do find a way to use hiphop, it'll be interesting to compare hand-written C++ code and the one hiphop would generate.

Update: I rewrote this thing in c++, on average it works 20 times faster and uses much less memory. Let me see if I can make it even faster.

Lower answered 4/2, 2011 at 23:50 Comment(9)
You could have written your code very inefficiently, but we can't know that unless we see it. While PHP is generally slower than compiled langauges, that doesn't mean that it's the slowness of the language that's causing your code to take so long to executeFabian
Don't write anything this performance sensitive in PHP. Best bet would be to write in C++ (try hip-hop).Shorter
Hey dfo, if possible can we see some code for the part thats taking up all the time? Maybe someone can suggest some improvements to the algorithm or a better way of storing your data.Scutari
php.net/manual/en/function.set-time-limit.php reset the 30 second time limit run time. you may also want to use binary strings or integers with bitwise operators instead of arrays with 1s ands 0s.Manx
Thanks for all the comments, stackoverflow rocks.Lower
+1 for taking the time to profile your code and a generally well-written question.Promotion
I changed added a quick look-through of your code to my answer ;)Vondavonni
Once again, since comment can't be downvoted: PHP's relatively poor performance is well known fact. I'd recommend you to try Java since it's syntax is similar, after all, it was one of PHP 5's inspirations.Conlan
if u will convert it in c++ paste here the time comparisonHerbart
V
26

Array access in PHP can certainly be slow. PHP uses hash tables to implement arrays, i.e. in order to access an element in an array it has to calculate a hash and traverse a linked list. Using a compiled language with real arrays will definitely improve performance, because there a direct memory access is made. For the interested: Code for hash access with string and with integer.

Concerning your code, there are several points I would optimize:

  • return directly, don't break twice.
  • put $file->get_width() and $file->get_height into simple variables. I assume that the height or width doesn't change throughout the process. Remember: Functions in PHP are slow.
  • Use a one-dimensional array, instead of nested arrays. You save one hash lookup per iteration that way. Actually a one-dimensional array is only marginally faster or even slightly slower. Comparison of several ways of saving the data concerning performance and memory usage.

.

function fits($bin, $x, $y, $w, $h) {
    $w += $x;
    $h += $y;

    for ($i = $x; $i < $w; ++$i) {
        for ($j = $y; $j < $h; ++$j) {
            if ($bin[$i][$j] !== 0) {
                return false;
            }
        } 
    }

    return true;   
}

Though I'm not sure, why you add $x to the $width / $y to the $height. Don't you want to iterate from the current coordinates to the image boundaries?

Vondavonni answered 4/2, 2011 at 23:54 Comment(26)
As always, I would appreciate an explanation of the downvote.Vondavonni
@nikic: I did not downvote you myself. But PHP does not only access arrays associatively via hash, it has sequential access as well.Yovonnda
@Orbling: Oh, I didn't know that. Do you have some reading on that topic?Vondavonni
@nikic: Yes, it has both forms of access internally. From a code perspective too, see the set of array access functions: current()/each(); next(); prev(); end(); reset(); and foreach() { } obv. -- all use the internal pointers == sequential access.Yovonnda
@Orbling: Well, that's a little bit different. That's for iterating. I'm talking more about accessing an element by key.Vondavonni
@nikic: If you keep the elements default numerically indexed, then it'll be more efficient than the hash.Yovonnda
@Orbling: Well, could you provide some reference / source code? I can't really imagine how this is supposed to work ;)Vondavonni
@nikic: Like what? $arr[0]? You really do not want to look at the PHP source code, it is not pleasant.Yovonnda
@Orbling: I checked against lxr and found that for sequential, numerical arrays the hash table is still used. Obviously there are several simplifications if the index is numeric, for example the hash doesn't need to be calculated, but only the table mask applied. And when traversing the linked list obviously the check is much simpler, as a number instead of a string is compared. But basically it functions in the same way. It's still a hash table lookup, with some optimizations, but still a hash table lookup. It isn't a direct memory access like in C.Vondavonni
@nikic: Well it couldn't be a direct memory access, because the data is variable length.Yovonnda
@Orbling: Well, but it does use the hash table, only the hash is calculated a litte bit simpler. So my answer was, in the end, correct.Vondavonni
Interesting with some code to look at +1. The flattened array seems a pretty good optimizitation regardless of the internal Zend structures. Since it is a bit-array anyway, it might (very eventually) help to turn it into a string even (depending on size and if proportions don't change). So doing if ($bin{$i * $w + $j} == "\1") might be feasible. (Even though less readable..)Ripleigh
@mario: Actually one wouldn't even need to change anything apart from the === to a == ;)Vondavonni
@nikic: nice hints on optimization! I don't break any more and passing object properties instead of object seems to speed up things a tiny bit. Your code doesn't make sense though, if $w is width of an array, adding $x to it would make it to check wrong elements of an array - if it was 2d array, it would check beyond the boundaries.Lower
@dfo: That way exactly the thing I wondered about. In your code above you are adding $x to the width, too. I don't really understand why.Vondavonni
@dfo: Maybe you can put your whole code on gist, so we can help you ;)Vondavonni
@nikic: $file object represents the small rectangles, and I'm checking if it fits in the big one - for that I need the array to have zeros from $x to $x + width of $file and from $y to $y + height of $file.Lower
@dfo - Building on nikic's modifications, this suggestion falls under the category of micro-optimization... but sometimes (especially when working with tight loops) micro-optimization can make a noticeable difference. You may find that a while loop is faster than a for loop: $i = $x-1; while (++$i < $w) { $j = $y-1; while (++$j < $h) { if ($bin[$i * $w + $j] !== 0) { return false; } } }Fabian
@Mark Baker: yeah, I saw it on phpbench.com. I hate rewriting them all... may be magical HipHop will fix it?Lower
I did a test, [$i][$j] works faster than [$i * $w + $j]. No matter what $w is.Lower
@dfo: As I couldn't believe what you just said, I ran a set of benchmarks. Code + Results. And it seems you are right. Multidimensional and flat arrays execute equally fast, with the multidimensional being a little bit faster. The string is slowest. The performance of the multidimensional array can be improved by storing the result of the outer loop.Vondavonni
What I really do not understand, is why the string access is so damn slow.Vondavonni
@nikic: it's only 50% slower, but look at the memory usage. 50x improvement over arrays!Lower
@dfo: Yes, that's the good side about it. PHP needs to store only a single byte of data per coordinate, not a big zval struct. But as your main issue is performance, the memory savings probably aren't such a big comfort.Vondavonni
I've been working on the benchmarks over @ alioth.debian and PHP's array performance is completely horrible when it comes to heavy lifting. fankuch-redux and binary-trees are excellent examples of how broken slow PHP array is. Still trying to make that work with strings though, as they are true arrays underneath. It may be acceptable in your case to use strings.Garcia
PHP7 FIXES ARRAY PERFORMANCE. If you use PHP 7 packed arrays (arrays where the keys must be integers only and appear in ascending order), performance can be improved 10fold. Packed arrays behave similarly to true arrays instead of slow hash tables.Friary
F
15

UPDATED ANSWER NEEDED AS OF 2018.

This question is old and the answers given aren't completely true in PHP 7 if you use packed arrays. Because the question shows up first hit on google, I'm adding a new answer

If you use only integers as array keys in PHP 7 and make sure that you insert them into the array in ascending order, you can see improvements of 10x faster array operations.

Read here: Blackfire Blog on PHP 7 Array Improvements

Friary answered 16/1, 2018 at 19:25 Comment(0)
R
14

The solution to your problem might be https://github.com/facebook/hiphop-php/wiki/

As said by everyone else, PHP is not the optimal language for calculation intensive tasks. It also doesn't really have an array type. What's described as array() in PHP is really a dictionary / hash map. It has some optimizations to double as list, but as you've already discovered it doesn't provide the same runtime behaviour as C pointers and arrays.

HipHop can transform PHP code into optimized C++. It was targetted at string manipulation as well, but it could very well offer a proper array/list transformation.

Disclaimer: I've never tried it. Just wanted to contribute a smart sounding answer here.

Ripleigh answered 5/2, 2011 at 0:11 Comment(0)
P
7

To suggest another PHP alternative:

Have you looked into SplFixedArray ?

Depending on how your arrays are structured (linear 0 to x) arrays this can perform quite a bit faster

For a benchmark see: http://www.slideshare.net/tobias382/new-spl-features-in-php-53 Slide 15 & 16 (sorry, didn't find a better one)

Palpitate answered 5/2, 2011 at 19:55 Comment(2)
I updated my test to include the fixed array: gist.github.com/813289 For the multidimensional, cached array it's slightly faster, for the flat array it's slower. The good thing about it, is that it uses less memory, because it saves all the hash table/buckets related stuff.Vondavonni
it's very much slower in multidimensional arrays because of object creation overhead actually - this disappears when the array start containing a lot of stuff though.Garcia
L
1

A more recent alternative is the QB extension to PHP that is specifically designed to help with this kind of problem.

While PHP is an excellent language for building complex web application, it imposes certain limitations. Writing code that performs low-level, computationally intensive tasks in PHP is generally impractical--it'd simply be too slow. The QB extension addresses this particular weakness of PHP. By translating Zend opcodes and executing them through a statically typed virtual machine, QB offers an order-of-magnitude gain in performance. The added power allows PHP programmers do things they were unable to do before, such a complex, pixel-level image manipulation.

See: http://php-qb.net/

^ This website no longer contains the PHP-QB project (2022)

Liturgist answered 11/6, 2014 at 9:53 Comment(0)
Z
0

Arrays in PHP indeed seem to be quite slow, especially looping through multidimensional arrays. Another option would be to try Quercus. It's an implementation of PHP in Java. I suppose it uses Java arrays. I haven't made a comparison though.

Zoophyte answered 8/9, 2016 at 8:1 Comment(0)
M
-7

The question is nearly qualifiable as "primarily opinion-based". With that in mind:

"Is PHP just not good for it?"

PHP was originally just a web templating language, and simplicity was higher concern than performance when it was designed. It has evolved over time and many optimizations were added, but still, PHP's performance is relatively poor to the other platforms. So if your criteria is performance, then PHP is not good for it.

"I'm thinking of converting the whole thing to some compiled language."

Technically, can PHP also be compiled. There is a PHP to C++ compiler by Facebook. There is a just-in-time kind-of-compiler by Zend. There used to be a PHP on Java interpreter (although not active anymore if I remember correctly).

I'd recommend you to try Java since it's syntax is similar, after all, it was one of PHP 5's inspirations. Java bytecode is compiled into native code since JDK 1.5. The performance should arise cca 4x for the same structure of code (assuming you use the community PHP distribution).

Mundy answered 15/2, 2011 at 18:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.