Performance comparison: call explode() in foreach() signature versus passing exploded data as a variable to foreach()
Asked Answered
H

4

9
foreach(explode(',' $foo) as $bar) { ... }

vs

$test = explode(',' $foo);
foreach($test as $bar) { ... }

In the first example, does it explode the $foo string for each iteration or does PHP keep it in memory exploded in its own temporary variable? From an efficiency point of view, does it make sense to create the extra variable $test or are both pretty much equal?

Haff answered 2/5, 2011 at 19:54 Comment(1)
@tomalak-geretkal thanks, it's a control structure.Haff
D
25

I could make an educated guess, but let's try it out!

I figured there were three main ways to approach this.

  1. explode and assign before entering the loop
  2. explode within the loop, no assignment
  3. string tokenize

My hypotheses:

  1. probably consume more memory due to assignment
  2. probably identical to #1 or #3, not sure which
  3. probably both quicker and much smaller memory footprint

Approach

Here's my test script:

<?php

ini_set('memory_limit', '1024M');

$listStr = 'text';
$listStr .= str_repeat(',text', 9999999);

$timeStart = microtime(true);

/*****
 * {INSERT LOOP HERE}
 */

$timeEnd = microtime(true);
$timeElapsed = $timeEnd - $timeStart;

printf("Memory used: %s kB\n", memory_get_peak_usage()/1024);
printf("Total time: %s s\n", $timeElapsed);

And here are the three versions:

1)

// explode separately 
$arr = explode(',', $listStr);
foreach ($arr as $val) {}

2)

// explode inline-ly 
foreach (explode(',', $listStr) as $val) {}

3)

// tokenize
$tok = strtok($listStr, ',');
while ($tok = strtok(',')) {}

Results

explode() benchmark results

Conclusions

Looks like some assumptions were disproven. Don't you love science? :-)

  • In the big picture, any of these methods is sufficiently fast for a list of "reasonable size" (few hundred or few thousand).
  • If you're iterating over something huge, time difference is relatively minor but memory usage could be different by an order of magnitude!
  • When you explode() inline without pre-assignment, it's a fair bit slower for some reason.
  • Surprisingly, tokenizing is a bit slower than explicitly iterating a declared array. Working on such a small scale, I believe that's due to the call stack overhead of making a function call to strtok() every iteration. More on this below.

In terms of number of function calls, explode()ing really tops tokenizing. O(1) vs O(n)

I added a bonus to the chart where I run method 1) with a function call in the loop. I used strlen($val), thinking it would be a relatively similar execution time. That's subject to debate, but I was only trying to make a general point. (I only ran strlen($val) and ignored its output. I did not assign it to anything, for an assignment would be an additional time-cost.)

// explode separately 
$arr = explode(',', $listStr);
foreach ($arr as $val) {strlen($val);}

As you can see from the results table, it then becomes the slowest method of the three.

Final thought

This is interesting to know, but my suggestion is to do whatever you feel is most readable/maintainable. Only if you're really dealing with a significantly large dataset should you be worried about these micro-optimizations.

Dithyramb answered 2/5, 2011 at 22:16 Comment(3)
It is fairly obvious that $x=...;foreach($x will use more memory after the loop than foreach(.... Simply because the $x variable it still intact afterwords. But this doesn't matter much. The variable will be destroyed on a return or, if the memory is critical, can just be unset. What is interesting is the peak memory usage, because only that matters towards the memory_limit. And here I'm very confident that it will give you pretty similar results for both foreach variants. Remember: If you are "benchmarking" memory you normally want to use memory_get_peak_usage, not ...Douceur
... memory_get_usage. The fact that #2 and #3 are identical in memory also yields from the fact that you are just measuring memory usage after the loop. Peak memory would probably be smaller for strtok. This has nothing to do with PHP transforming the explode to tokenization when in a loop. And concerning your last example: Function calls are expensive in PHP ;) You strlen($val); would involve executing five opcodes.Douceur
Thanks, also it is important to mention version and OS for these results.Concinnous
F
5

In the first case, PHP explodes it once and keeps it in memory.

The impact of creating a different variable or the other way would be negligible. PHP Interpreter would need to maintain a pointer to a location of next item whether they are user defined or not.

Finer answered 2/5, 2011 at 19:57 Comment(0)
D
4

From the point of memory it will not make a difference, because PHP uses the copy on write concept.

Apart from that, I personally would opt for the first option - it's a line less, but not less readable (imho!).

Douceur answered 2/5, 2011 at 20:1 Comment(1)
@Ryan: Ooops, I meant the first one ;)Douceur
T
1

Efficiency in what sense? Memory management, or processor? Processor wouldn't make a difference, for memory - you can always do $foo = explode(',', $foo)

Tantara answered 2/5, 2011 at 19:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.