How does PHP 'foreach' actually work?
Asked Answered
D

7

2279

Let me prefix this by saying that I know what foreach is, does and how to use it. This question concerns how it works under the bonnet, and I don't want any answers along the lines of "this is how you loop an array with foreach".


For a long time I assumed that foreach worked with the array itself. Then I found many references to the fact that it works with a copy of the array, and I have since assumed this to be the end of the story. But I recently got into a discussion on the matter, and after a little experimentation found that this was not in fact 100% true.

Let me show what I mean. For the following test cases, we will be working with the following array:

$array = array(1, 2, 3, 4, 5);

Test case 1:

foreach ($array as $item) {
  echo "$item\n";
  $array[] = $item;
}
print_r($array);

/* Output in loop:    1 2 3 4 5
   $array after loop: 1 2 3 4 5 1 2 3 4 5 */

This clearly shows that we are not working directly with the source array - otherwise the loop would continue forever, since we are constantly pushing items onto the array during the loop. But just to be sure this is the case:

Test case 2:

foreach ($array as $key => $item) {
  $array[$key + 1] = $item + 2;
  echo "$item\n";
}

print_r($array);

/* Output in loop:    1 2 3 4 5
   $array after loop: 1 3 4 5 6 7 */

This backs up our initial conclusion, we are working with a copy of the source array during the loop, otherwise we would see the modified values during the loop. But...

If we look in the manual, we find this statement:

When foreach first starts executing, the internal array pointer is automatically reset to the first element of the array.

Right... this seems to suggest that foreach relies on the array pointer of the source array. But we've just proved that we're not working with the source array, right? Well, not entirely.

Test case 3:

// Move the array pointer on one to make sure it doesn't affect the loop
var_dump(each($array));

foreach ($array as $item) {
  echo "$item\n";
}

var_dump(each($array));

/* Output
  array(4) {
    [1]=>
    int(1)
    ["value"]=>
    int(1)
    [0]=>
    int(0)
    ["key"]=>
    int(0)
  }
  1
  2
  3
  4
  5
  bool(false)
*/

So, despite the fact that we are not working directly with the source array, we are working directly with the source array pointer - the fact that the pointer is at the end of the array at the end of the loop shows this. Except this can't be true - if it was, then test case 1 would loop forever.

The PHP manual also states:

As foreach relies on the internal array pointer changing it within the loop may lead to unexpected behavior.

Well, let's find out what that "unexpected behavior" is (technically, any behavior is unexpected since I no longer know what to expect).

Test case 4:

foreach ($array as $key => $item) {
  echo "$item\n";
  each($array);
}

/* Output: 1 2 3 4 5 */

Test case 5:

foreach ($array as $key => $item) {
  echo "$item\n";
  reset($array);
}

/* Output: 1 2 3 4 5 */

...nothing that unexpected there, in fact it seems to support the "copy of source" theory.


The Question

What is going on here? My C-fu is not good enough for me to able to extract a proper conclusion simply by looking at the PHP source code, I would appreciate it if someone could translate it into English for me.

It seems to me that foreach works with a copy of the array, but sets the array pointer of the source array to the end of the array after the loop.

  • Is this correct and the whole story?
  • If not, what is it really doing?
  • Is there any situation where using functions that adjust the array pointer (each(), reset() et al.) during a foreach could affect the outcome of the loop?
Denigrate answered 7/4, 2012 at 19:33 Comment(16)
@Denigrate There's a php-internals tag this should probably go with, but I'll leave it to you to decide which if any of the other 5 tags to replace.Arte
try also unset($array[$key + 1]);Junitajunius
looks like COW, without delete handleJunitajunius
At first I thought »gosh, another newbie question. Read the docs… hm, clearly undefined behavior«. Then I read the complete question, and I must say: I like it. You've put quite some effort in it and writing all the testcases. ps. are testcase 4 and 5 the same?Bobodioulasso
@Bobodioulasso Difference between 4 + 5 is each() vs reset() - trying to force it to skip an element or start from the beginning, respectively. Behavior it seems is identical for both though, the pointer of the source array is ignored.Denigrate
@eicto Still no effect on the execution of the loop - codepad.org/y4hPzEw6. what do you mean by looks like COW... - can you elaborate?Denigrate
@eicto Oh I see what you mean, thanks for the clarification. I think that is probably a sensible suggestion, although I'm still not sure how that would affect the source array's pointer. I didn't think you meant it looked like a black and white bovine, but it's as well to be sure ;-)Denigrate
Just a thought about why it does make sense that the array pointer gets touched: PHP needs to reset and move the internal array pointer of the original array along with the copy, because the user may ask for a reference to the current value (foreach ($array as &$value)) - PHP needs to know the current position in the original array even though it's actually iterating over a copy.Kristianson
this one looks more confusing codepad.org/RKFka0tdKoch
On a side note: Its not C++-fu, you need C-fu. :)Eade
Actually, @OliCharlesworth, I would argue that PHP has the best documentation of just about any programming language in the world. There are plenty of reasons to hate on PHP but its documentation is not one of them.Octoroon
@Sean: IMHO, the PHP documentation is really quite bad at describing the nuances of core language features. But that is, perhaps, because so many ad-hoc special cases are baked into the language...Mcclain
You are iterating over an array with refcount = 1 by reference, so it's immediately clear that a) no copy will be made and b) the array will be made a reference (removing the ref will cause the changes to not be visible inside the loop).Linder
@monocell yes, I've re-read that again. But the thing is - it's not so clear for me that same mechanics of code evaluation (and pointer moving) order works in different ways.Gamages
@AlmaDo: this question is being discussed on Meta: meta.stackexchange.com/questions/229549/…Extranuclear
(technically, any behavior is unexpected since I no longer know what to expect) -- brilliantSame
F
1831

foreach supports iteration over three different kinds of values:

In the following, I will try to explain precisely how iteration works in different cases. By far the simplest case is Traversable objects, as for these foreach is essentially only syntax sugar for code along these lines:

foreach ($it as $k => $v) { /* ... */ }

/* translates to: */

if ($it instanceof IteratorAggregate) {
    $it = $it->getIterator();
}
for ($it->rewind(); $it->valid(); $it->next()) {
    $v = $it->current();
    $k = $it->key();
    /* ... */
}

For internal classes, actual method calls are avoided by using an internal API that essentially just mirrors the Iterator interface on the C level.

Iteration of arrays and plain objects is significantly more complicated. First of all, it should be noted that in PHP "arrays" are really ordered dictionaries and they will be traversed according to this order (which matches the insertion order as long as you didn't use something like sort). This is opposed to iterating by the natural order of the keys (how lists in other languages often work) or having no defined order at all (how dictionaries in other languages often work).

The same also applies to objects, as the object properties can be seen as another (ordered) dictionary mapping property names to their values, plus some visibility handling. In the majority of cases, the object properties are not actually stored in this rather inefficient way. However, if you start iterating over an object, the packed representation that is normally used will be converted to a real dictionary. At that point, iteration of plain objects becomes very similar to iteration of arrays (which is why I'm not discussing plain-object iteration much in here).

So far, so good. Iterating over a dictionary can't be too hard, right? The problems begin when you realize that an array/object can change during iteration. There are multiple ways this can happen:

  • If you iterate by reference using foreach ($arr as &$v) then $arr is turned into a reference and you can change it during iteration.
  • In PHP 5 the same applies even if you iterate by value, but the array was a reference beforehand: $ref =& $arr; foreach ($ref as $v)
  • Objects have by-handle passing semantics, which for most practical purposes means that they behave like references. So objects can always be changed during iteration.

The problem with allowing modifications during iteration is the case where the element you are currently on is removed. Say you use a pointer to keep track of which array element you are currently at. If this element is now freed, you are left with a dangling pointer (usually resulting in a segfault).

There are different ways of solving this issue. PHP 5 and PHP 7 differ significantly in this regard and I'll describe both behaviors in the following. The summary is that PHP 5's approach was rather dumb and lead to all kinds of weird edge-case issues, while PHP 7's more involved approach results in more predictable and consistent behavior.

As a last preliminary, it should be noted that PHP uses reference counting and copy-on-write to manage memory. This means that if you "copy" a value, you actually just reuse the old value and increment its reference count (refcount). Only once you perform some kind of modification a real copy (called a "duplication") will be done. See You're being lied to for a more extensive introduction on this topic.

PHP 5

Internal array pointer and HashPointer

Arrays in PHP 5 have one dedicated "internal array pointer" (IAP), which properly supports modifications: Whenever an element is removed, there will be a check whether the IAP points to this element. If it does, it is advanced to the next element instead.

While foreach does make use of the IAP, there is an additional complication: There is only one IAP, but one array can be part of multiple foreach loops:

// Using by-ref iteration here to make sure that it's really
// the same array in both loops and not a copy
foreach ($arr as &$v1) {
    foreach ($arr as &$v) {
        // ...
    }
}

To support two simultaneous loops with only one internal array pointer, foreach performs the following shenanigans: Before the loop body is executed, foreach will back up a pointer to the current element and its hash into a per-foreach HashPointer. After the loop body runs, the IAP will be set back to this element if it still exists. If however the element has been removed, we'll just use wherever the IAP is currently at. This scheme mostly-kinda-sort of works, but there's a lot of weird behavior you can get out of it, some of which I'll demonstrate below.

Array duplication

The IAP is a visible feature of an array (exposed through the current family of functions), as such changes to the IAP count as modifications under copy-on-write semantics. This, unfortunately, means that foreach is in many cases forced to duplicate the array it is iterating over. The precise conditions are:

  1. The array is not a reference (is_ref=0). If it's a reference, then changes to it are supposed to propagate, so it should not be duplicated.
  2. The array has refcount>1. If refcount is 1, then the array is not shared and we're free to modify it directly.

If the array is not duplicated (is_ref=0, refcount=1), then only its refcount will be incremented (*). Additionally, if foreach by reference is used, then the (potentially duplicated) array will be turned into a reference.

Consider this code as an example where duplication occurs:

function iterate($arr) {
    foreach ($arr as $v) {}
}

$outerArr = [0, 1, 2, 3, 4];
iterate($outerArr);

Here, $arr will be duplicated to prevent IAP changes on $arr from leaking to $outerArr. In terms of the conditions above, the array is not a reference (is_ref=0) and is used in two places (refcount=2). This requirement is unfortunate and an artifact of the suboptimal implementation (there is no concern of modification during iteration here, so we don't really need to use the IAP in the first place).

(*) Incrementing the refcount here sounds innocuous, but violates copy-on-write (COW) semantics: This means that we are going to modify the IAP of a refcount=2 array, while COW dictates that modifications can only be performed on refcount=1 values. This violation results in user-visible behavior change (while a COW is normally transparent) because the IAP change on the iterated array will be observable -- but only until the first non-IAP modification on the array. Instead, the three "valid" options would have been a) to always duplicate, b) do not increment the refcount and thus allowing the iterated array to be arbitrarily modified in the loop or c) don't use the IAP at all (the PHP 7 solution).

Position advancement order

There is one last implementation detail that you have to be aware of to properly understand the code samples below. The "normal" way of looping through some data structure would look something like this in pseudocode:

reset(arr);
while (get_current_data(arr, &data) == SUCCESS) {
    code();
    move_forward(arr);
}

However foreach, being a rather special snowflake, chooses to do things slightly differently:

reset(arr);
while (get_current_data(arr, &data) == SUCCESS) {
    move_forward(arr);
    code();
}

Namely, the array pointer is already moved forward before the loop body runs. This means that while the loop body is working on element $i, the IAP is already at element $i+1. This is the reason why code samples showing modification during iteration will always unset the next element, rather than the current one.

Examples: Your test cases

The three aspects described above should provide you with a mostly complete impression of the idiosyncrasies of the foreach implementation and we can move on to discuss some examples.

The behavior of your test cases is simple to explain at this point:

  • In test cases 1 and 2 $array starts off with refcount=1, so it will not be duplicated by foreach: Only the refcount is incremented. When the loop body subsequently modifies the array (which has refcount=2 at that point), the duplication will occur at that point. Foreach will continue working on an unmodified copy of $array.

  • In test case 3, once again the array is not duplicated, thus foreach will be modifying the IAP of the $array variable. At the end of the iteration, the IAP is NULL (meaning iteration has done), which each indicates by returning false.

  • In test cases 4 and 5 both each and reset are by-reference functions. The $array has a refcount=2 when it is passed to them, so it has to be duplicated. As such foreach will be working on a separate array again.

Examples: Effects of current in foreach

A good way to show the various duplication behaviors is to observe the behavior of the current() function inside a foreach loop. Consider this example:

foreach ($array as $val) {
    var_dump(current($array));
}
/* Output: 2 2 2 2 2 */

Here you should know that current() is a by-ref function (actually: prefer-ref), even though it does not modify the array. It has to be in order to play nice with all the other functions like next which are all by-ref. By-reference passing implies that the array has to be separated and thus $array and the foreach-array will be different. The reason you get 2 instead of 1 is also mentioned above: foreach advances the array pointer before running the user code, not after. So even though the code is at the first element, foreach already advanced the pointer to the second.

Now lets try a small modification:

$ref = &$array;
foreach ($array as $val) {
    var_dump(current($array));
}
/* Output: 2 3 4 5 false */

Here we have the is_ref=1 case, so the array is not copied (just like above). But now that it is a reference, the array no longer has to be duplicated when passing to the by-ref current() function. Thus current() and foreach work on the same array. You still see the off-by-one behavior though, due to the way foreach advances the pointer.

You get the same behavior when doing by-ref iteration:

foreach ($array as &$val) {
    var_dump(current($array));
}
/* Output: 2 3 4 5 false */

Here the important part is that foreach will make $array an is_ref=1 when it is iterated by reference, so basically you have the same situation as above.

Another small variation, this time we'll assign the array to another variable:

$foo = $array;
foreach ($array as $val) {
    var_dump(current($array));
}
/* Output: 1 1 1 1 1 */

Here the refcount of the $array is 2 when the loop is started, so for once we actually have to do the duplication upfront. Thus $array and the array used by foreach will be completely separate from the outset. That's why you get the position of the IAP wherever it was before the loop (in this case it was at the first position).

Examples: Modification during iteration

Trying to account for modifications during iteration is where all our foreach troubles originated, so it serves to consider some examples for this case.

Consider these nested loops over the same array (where by-ref iteration is used to make sure it really is the same one):

foreach ($array as &$v1) {
    foreach ($array as &$v2) {
        if ($v1 == 1 && $v2 == 1) {
            unset($array[1]);
        }
        echo "($v1, $v2)\n";
    }
}

// Output: (1, 1) (1, 3) (1, 4) (1, 5)

The expected part here is that (1, 2) is missing from the output because element 1 was removed. What's probably unexpected is that the outer loop stops after the first element. Why is that?

The reason behind this is the nested-loop hack described above: Before the loop body runs, the current IAP position and hash is backed up into a HashPointer. After the loop body it will be restored, but only if the element still exists, otherwise the current IAP position (whatever it may be) is used instead. In the example above this is exactly the case: The current element of the outer loop has been removed, so it will use the IAP, which has already been marked as finished by the inner loop!

Another consequence of the HashPointer backup+restore mechanism is that changes to the IAP through reset() etc. usually do not impact foreach. For example, the following code executes as if the reset() were not present at all:

$array = [1, 2, 3, 4, 5];
foreach ($array as &$value) {
    var_dump($value);
    reset($array);
}
// output: 1, 2, 3, 4, 5

The reason is that, while reset() temporarily modifies the IAP, it will be restored to the current foreach element after the loop body. To force reset() to make an effect on the loop, you have to additionally remove the current element, so that the backup/restore mechanism fails:

$array = [1, 2, 3, 4, 5];
$ref =& $array;
foreach ($array as $value) {
    var_dump($value);
    unset($array[1]);
    reset($array);
}
// output: 1, 1, 3, 4, 5

But, those examples are still sane. The real fun starts if you remember that the HashPointer restore uses a pointer to the element and its hash to determine whether it still exists. But: Hashes have collisions, and pointers can be reused! This means that, with a careful choice of array keys, we can make foreach believe that an element that has been removed still exists, so it will jump directly to it. An example:

$array = ['EzEz' => 1, 'EzFY' => 2, 'FYEz' => 3];
$ref =& $array;
foreach ($array as $value) {
    unset($array['EzFY']);
    $array['FYFY'] = 4;
    reset($array);
    var_dump($value);
}
// output: 1, 4

Here we should normally expect the output 1, 1, 3, 4 according to the previous rules. How what happens is that 'FYFY' has the same hash as the removed element 'EzFY', and the allocator happens to reuse the same memory location to store the element. So foreach ends up directly jumping to the newly inserted element, thus short-cutting the loop.

Substituting the iterated entity during the loop

One last odd case that I'd like to mention, it is that PHP allows you to substitute the iterated entity during the loop. So you can start iterating on one array and then replace it with another array halfway through. Or start iterating on an array and then replace it with an object:

$arr = [1, 2, 3, 4, 5];
$obj = (object) [6, 7, 8, 9, 10];

$ref =& $arr;
foreach ($ref as $val) {
    echo "$val\n";
    if ($val == 3) {
        $ref = $obj;
    }
}
/* Output: 1 2 3 6 7 8 9 10 */

As you can see in this case PHP will just start iterating the other entity from the start once the substitution has happened.

PHP 7

Hashtable iterators

If you still remember, the main problem with array iteration was how to handle removal of elements mid-iteration. PHP 5 used a single internal array pointer (IAP) for this purpose, which was somewhat suboptimal, as one array pointer had to be stretched to support multiple simultaneous foreach loops and interaction with reset() etc. on top of that.

PHP 7 uses a different approach, namely, it supports creating an arbitrary amount of external, safe hashtable iterators. These iterators have to be registered in the array, from which point on they have the same semantics as the IAP: If an array element is removed, all hashtable iterators pointing to that element will be advanced to the next element.

This means that foreach will no longer use the IAP at all. The foreach loop will be absolutely no effect on the results of current() etc. and its own behavior will never be influenced by functions like reset() etc.

Array duplication

Another important change between PHP 5 and PHP 7 relates to array duplication. Now that the IAP is no longer used, by-value array iteration will only do a refcount increment (instead of duplication the array) in all cases. If the array is modified during the foreach loop, at that point a duplication will occur (according to copy-on-write) and foreach will keep working on the old array.

In most cases, this change is transparent and has no other effect than better performance. However, there is one occasion where it results in different behavior, namely the case where the array was a reference beforehand:

$array = [1, 2, 3, 4, 5];
$ref = &$array;
foreach ($array as $val) {
    var_dump($val);
    $array[2] = 0;
}
/* Old output: 1, 2, 0, 4, 5 */
/* New output: 1, 2, 3, 4, 5 */

Previously by-value iteration of reference-arrays was special cases. In this case, no duplication occurred, so all modifications of the array during iteration would be reflected by the loop. In PHP 7 this special case is gone: A by-value iteration of an array will always keep working on the original elements, disregarding any modifications during the loop.

This, of course, does not apply to by-reference iteration. If you iterate by-reference all modifications will be reflected by the loop. Interestingly, the same is true for by-value iteration of plain objects:

$obj = new stdClass;
$obj->foo = 1;
$obj->bar = 2;
foreach ($obj as $val) {
    var_dump($val);
    $obj->bar = 42;
}
/* Old and new output: 1, 42 */

This reflects the by-handle semantics of objects (i.e. they behave reference-like even in by-value contexts).

Examples

Let's consider a few examples, starting with your test cases:

  • Test cases 1 and 2 retain the same output: By-value array iteration always keep working on the original elements. (In this case, even refcounting and duplication behavior is exactly the same between PHP 5 and PHP 7).

  • Test case 3 changes: Foreach no longer uses the IAP, so each() is not affected by the loop. It will have the same output before and after.

  • Test cases 4 and 5 stay the same: each() and reset() will duplicate the array before changing the IAP, while foreach still uses the original array. (Not that the IAP change would have mattered, even if the array was shared.)

The second set of examples was related to the behavior of current() under different reference/refcounting configurations. This no longer makes sense, as current() is completely unaffected by the loop, so its return value always stays the same.

However, we get some interesting changes when considering modifications during iteration. I hope you will find the new behavior saner. The first example:

$array = [1, 2, 3, 4, 5];
foreach ($array as &$v1) {
    foreach ($array as &$v2) {
        if ($v1 == 1 && $v2 == 1) {
            unset($array[1]);
        }
        echo "($v1, $v2)\n";
    }
}

// Old output: (1, 1) (1, 3) (1, 4) (1, 5)
// New output: (1, 1) (1, 3) (1, 4) (1, 5)
//             (3, 1) (3, 3) (3, 4) (3, 5)
//             (4, 1) (4, 3) (4, 4) (4, 5)
//             (5, 1) (5, 3) (5, 4) (5, 5) 

As you can see, the outer loop no longer aborts after the first iteration. The reason is that both loops now have entirely separate hashtable iterators, and there is no longer any cross-contamination of both loops through a shared IAP.

Another weird edge case that is fixed now, is the odd effect you get when you remove and add elements that happen to have the same hash:

$array = ['EzEz' => 1, 'EzFY' => 2, 'FYEz' => 3];
foreach ($array as &$value) {
    unset($array['EzFY']);
    $array['FYFY'] = 4;
    var_dump($value);
}
// Old output: 1, 4
// New output: 1, 3, 4

Previously the HashPointer restore mechanism jumped right to the new element because it "looked" like it's the same as the removed element (due to colliding hash and pointer). As we no longer rely on the element hash for anything, this is no longer an issue.

Fluffy answered 13/2, 2013 at 13:21 Comment(7)
@Baba It does. Passing it to a function is the same as doing $foo = $array before the loop ;)Fluffy
foreach seems to save the position of the array in a HashPointer struct after an iteration and restores it before an iteration. The struct carries a HashPosition and the h of the bucket pointed to by the HashPosition. The hash is used to verify that the bucket is still in the table, so a HashPointer is safe to use even after array modification. That's probably why we can see the modified array position (each() returning null after foreach), and also why attempting to modify it doesn't affect foreach.Gooden
@arnaud576875 Good point, I forgot to mention that. Though I don't see how it is related to "see the modified array position (each() returning null after foreach)". That's just an effect of it using the IAP. Afaik it's just there to prevent next/prev calls in the foreach body from changing the position (in the ref case).Fluffy
For those of you who don't know what a zval is, please refer to Sara Goleman's blog.golemon.com/2007/01/youre-being-lied-to.htmlEy
@arnaud576875 I added some explanation (and even more weirdness!) regarding HashPointer at the end (starting with the last heading)Fluffy
Minor correction: what you call Bucket isn't what is normally called Bucket in a hashtable. Normally Bucket is a set of entries with the same hash%size. You seem to use it for what is normally called an entry. Linked list isn't on buckets, but on entries.Carolinacaroline
@Carolinacaroline I'm using the terminology used internally by PHP. The Buckets are part of a doubly linked list for hash collisions and also part of a doubly linked list for order ;)Fluffy
K
128

In example 3 you don't modify the array. In all other examples you modify either the contents or the internal array pointer. This is important when it comes to PHP arrays because of the semantics of the assignment operator.

The assignment operator for the arrays in PHP works more like a lazy clone. Assigning one variable to another that contains an array will clone the array, unlike most languages. However, the actual cloning will not be done unless it is needed. This means that the clone will take place only when either of the variables is modified (copy-on-write).

Here is an example:

$a = array(1,2,3);
$b = $a;  // This is lazy cloning of $a. For the time
          // being $a and $b point to the same internal
          // data structure.

$a[] = 3; // Here $a changes, which triggers the actual
          // cloning. From now on, $a and $b are two
          // different data structures. The same would
          // happen if there were a change in $b.

Coming back to your test cases, you can easily imagine that foreach creates some kind of iterator with a reference to the array. This reference works exactly like the variable $b in my example. However, the iterator along with the reference live only during the loop and then, they are both discarded. Now you can see that, in all cases but 3, the array is modified during the loop, while this extra reference is alive. This triggers a clone, and that explains what's going on here!

Here is an excellent article for another side effect of this copy-on-write behaviour: The PHP Ternary Operator: Fast or not?

Kurtiskurtosis answered 7/4, 2012 at 20:43 Comment(2)
seems your right, I made some example which demonstrate that: codepad.org/OCjtvu8r one difference from your example - it does not copy if you change value, only if change keys.Junitajunius
This does indeed explain all the behavior show above, and it can be nicely illustrated by calling each() at the end of the first test case, where we see that the array pointer of the original array points to the second element, since the array was modified during the first iteration. This also seems to demonstrate that foreach moves the array pointer on before executing the code block of the loop, which I was not expecting - I would have thought it would do this at the end. Many thanks, this clears it up for me nicely.Denigrate
K
57

Some points to note when working with foreach():

a) foreach works on the prospected copy of the original array. It means foreach() will have SHARED data storage until or unless a prospected copy is not created foreach Notes/User comments.

b) What triggers a prospected copy? A prospected copy is created based on the policy of copy-on-write, that is, whenever an array passed to foreach() is changed, a clone of the original array is created.

c) The original array and foreach() iterator will have DISTINCT SENTINEL VARIABLES, that is, one for the original array and other for foreach; see the test code below. SPL , Iterators, and Array Iterator.

Stack Overflow question How to make sure the value is reset in a 'foreach' loop in PHP? addresses the cases (3,4,5) of your question.

The following example shows that each() and reset() DOES NOT affect SENTINEL variables (for example, the current index variable) of the foreach() iterator.

$array = array(1, 2, 3, 4, 5);

list($key2, $val2) = each($array);
echo "each() Original (outside): $key2 => $val2<br/>";

foreach($array as $key => $val){
    echo "foreach: $key => $val<br/>";

    list($key2,$val2) = each($array);
    echo "each() Original(inside): $key2 => $val2<br/>";

    echo "--------Iteration--------<br/>";
    if ($key == 3){
        echo "Resetting original array pointer<br/>";
        reset($array);
    }
}

list($key2, $val2) = each($array);
echo "each() Original (outside): $key2 => $val2<br/>";

Output:

each() Original (outside): 0 => 1
foreach: 0 => 1
each() Original(inside): 1 => 2
--------Iteration--------
foreach: 1 => 2
each() Original(inside): 2 => 3
--------Iteration--------
foreach: 2 => 3
each() Original(inside): 3 => 4
--------Iteration--------
foreach: 3 => 4
each() Original(inside): 4 => 5
--------Iteration--------
Resetting original array pointer
foreach: 4 => 5
each() Original(inside): 0=>1
--------Iteration--------
each() Original (outside): 1 => 2
Kobe answered 7/4, 2012 at 21:3 Comment(6)
Your answer is not quite correct. foreach operates on a potential copy of the array, but it does not make the actual copy unless it is needed.Kurtiskurtosis
would you like to demonstrate how and when that potential copy is created through code ? My code demonstrates that foreach is copying the array 100% of the time. I am eager to know. Thanks for you commentsKobe
Copying an array costs a lot. Try counting the time it takes to iterate an array with 100000 elements using either for or foreach. You will not see any significant difference between the two of them, because an actual copy does not take place.Kurtiskurtosis
Then I would assume that there is SHARED data storage reserved until or unless copy-on-write , but (from my code snippet) its evident that there will always be TWO set of SENTINEL variables one for the original array and other for foreach. Thanks that makes senseKobe
"prospected"? Do you mean "protected"?Undervalue
yes that is "prospected" copy i.e "potential" copy.Its not protected as you suggestedKobe
P
41

NOTE FOR PHP 7

To update on this answer as it has gained some popularity: This answer no longer applies as of PHP 7. As explained in the "Backward incompatible changes", in PHP 7 foreach works on copy of the array, so any changes on the array itself are not reflected on foreach loop. More details at the link.

Explanation (quote from php.net):

The first form loops over the array given by array_expression. On each iteration, the value of the current element is assigned to $value and the internal array pointer is advanced by one (so on the next iteration, you'll be looking at the next element).

So, in your first example you only have one element in the array, and when the pointer is moved the next element does not exist, so after you add new element foreach ends because it already "decided" that it it as the last element.

In your second example, you start with two elements, and foreach loop is not at the last element so it evaluates the array on the next iteration and thus realises that there is new element in the array.

I believe that this is all consequence of On each iteration part of the explanation in the documentation, which probably means that foreach does all logic before it calls the code in {}.

Test case

If you run this:

<?
    $array = Array(
        'foo' => 1,
        'bar' => 2
    );
    foreach($array as $k=>&$v) {
        $array['baz']=3;
        echo $v." ";
    }
    print_r($array);
?>

You will get this output:

1 2 3 Array
(
    [foo] => 1
    [bar] => 2
    [baz] => 3
)

Which means that it accepted the modification and went through it because it was modified "in time". But if you do this:

<?
    $array = Array(
        'foo' => 1,
        'bar' => 2
    );
    foreach($array as $k=>&$v) {
        if ($k=='bar') {
            $array['baz']=3;
        }
        echo $v." ";
    }
    print_r($array);
?>

You will get:

1 2 Array
(
    [foo] => 1
    [bar] => 2
    [baz] => 3
)

Which means that array was modified, but since we modified it when the foreach already was at the last element of the array, it "decided" not to loop anymore, and even though we added new element, we added it "too late" and it was not looped through.

Detailed explanation can be read at How does PHP 'foreach' actually work? which explains the internals behind this behaviour.

Palaearctic answered 15/4, 2014 at 8:46 Comment(10)
Well did you read the rest of the answer? It makes perfect sense that foreach decides if it will loop another time before it even runs the code in it.Palaearctic
No, the array is modified, but "too late" since foreach already "thinks" that it is at the last element (which it is at the start of iteration) and will not loop anymore. Where in the second example, it is not at the last element at the start of iteration and evaluates again on the start of next iteration. I am trying to prepare a test case.Palaearctic
@AlmaDo Look at lxr.php.net/xref/PHP_TRUNK/Zend/zend_vm_def.h#4509 It's always set to the next pointer when it iterates. So, when it reaches the last iteration, it'll be marked as finished (via NULL pointer). When you then add a key in last iteration, foreach won't notice it.Hotchpot
@AlmaDo if you want internal reasons read them from accepted answer at https://mcmap.net/q/40550/-how-does-php-39-foreach-39-actually-work and I will update my answer to include this.Palaearctic
@DKasipovic no. There is no complete & clear explanation there (at least for now - may be I'm wrong)Gamages
I don't agree, however it is up to you. In the other answer and in my answer there is everything explained. Even my example proves what I said. I don't see what the problem is here. Maybe I dont understand you very well.Palaearctic
Actually it seems that @AlmaDo has a flaw in understanding his own logic… Your answer is fine.Hotchpot
@Damir can you update you answer, cause your second example give the same output like first for PHP version >= 7Electroscope
@circleandsquare in the "Backward incompatible changes" ( secure.php.net/manual/en/migration70.incompatible.php ) there is an explanation on how foreach works in PHP 7, I think it explains it pretty well. Basically, in PHP 7 foreach works on copy of the array, so any changes on the array itself are not reflected on foreach loop.Palaearctic
@Damir I agree with you about answer, just because this is one of top visited question I suggest you to improve your answer with mention about PHP version, cause this will confuse new ones.Electroscope
B
18

As per the documentation provided by PHP manual.

On each iteration, the value of the current element is assigned to $v and the internal
array pointer is advanced by one (so on the next iteration, you'll be looking at the next element).

So as per your first example:

$array = ['foo'=>1];
foreach($array as $k=>&$v)
{
   $array['bar']=2;
   echo($v);
}

$array have only single element, so as per the foreach execution, 1 assign to $v and it don't have any other element to move pointer

But in your second example:

$array = ['foo'=>1, 'bar'=>2];
foreach($array as $k=>&$v)
{
   $array['baz']=3;
   echo($v);
}

$array have two element, so now $array evaluate the zero indices and move the pointer by one. For first iteration of loop, added $array['baz']=3; as pass by reference.

Brame answered 15/4, 2014 at 9:32 Comment(0)
P
15

Great question, because many developers, even experienced ones, are confused by the way PHP handles arrays in foreach loops. In the standard foreach loop, PHP makes a copy of the array that is used in the loop. The copy is discarded immediately after the loop finishes. This is transparent in the operation of a simple foreach loop. For example:

$set = array("apple", "banana", "coconut");
foreach ( $set AS $item ) {
    echo "{$item}\n";
}

This outputs:

apple
banana
coconut

So the copy is created but the developer doesn't notice, because the original array isn’t referenced within the loop or after the loop finishes. However, when you attempt to modify the items in a loop, you find that they are unmodified when you finish:

$set = array("apple", "banana", "coconut");
foreach ( $set AS $item ) {
    $item = strrev ($item);
}

print_r($set);

This outputs:

Array
(
    [0] => apple
    [1] => banana
    [2] => coconut
)

Any changes from the original can't be notices, actually there are no changes from the original, even though you clearly assigned a value to $item. This is because you are operating on $item as it appears in the copy of $set being worked on. You can override this by grabbing $item by reference, like so:

$set = array("apple", "banana", "coconut");
foreach ( $set AS &$item ) {
    $item = strrev($item);
}
print_r($set);

This outputs:

Array
(
    [0] => elppa
    [1] => ananab
    [2] => tunococ
)

So it is evident and observable, when $item is operated on by-reference, the changes made to $item are made to the members of the original $set. Using $item by reference also prevents PHP from creating the array copy. To test this, first we’ll show a quick script demonstrating the copy:

$set = array("apple", "banana", "coconut");
foreach ( $set AS $item ) {
    $set[] = ucfirst($item);
}
print_r($set);

This outputs:

Array
(
    [0] => apple
    [1] => banana
    [2] => coconut
    [3] => Apple
    [4] => Banana
    [5] => Coconut
)

As it is shown in the example, PHP copied $set and used it to loop over, but when $set was used inside the loop, PHP added the variables to the original array, not the copied array. Basically, PHP is only using the copied array for the execution of the loop and the assignment of $item. Because of this, the loop above only executes 3 times, and each time it appends another value to the end of the original $set, leaving the original $set with 6 elements, but never entering an infinite loop.

However, what if we had used $item by reference, as I mentioned before? A single character added to the above test:

$set = array("apple", "banana", "coconut");
foreach ( $set AS &$item ) {
    $set[] = ucfirst($item);
}
print_r($set);

Results in an infinite loop. Note this actually is an infinite loop, you’ll have to either kill the script yourself or wait for your OS to run out of memory. I added the following line to my script so PHP would run out of memory very quickly, I suggest you do the same if you’re going to be running these infinite loop tests:

ini_set("memory_limit","1M");

So in this previous example with the infinite loop, we see the reason why PHP was written to create a copy of the array to loop over. When a copy is created and used only by the structure of the loop construct itself, the array stays static throughout the execution of the loop, so you’ll never run into issues.

Pyknic answered 21/4, 2017 at 8:44 Comment(0)
E
10

PHP foreach loop can be used with Indexed arrays, Associative arrays and Object public variables.

In foreach loop, the first thing php does is that it creates a copy of the array which is to be iterated over. PHP then iterates over this new copy of the array rather than the original one. This is demonstrated in the below example:

<?php
$numbers = [1,2,3,4,5,6,7,8,9]; # initial values for our array
echo '<pre>', print_r($numbers, true), '</pre>', '<hr />';
foreach($numbers as $index => $number){
    $numbers[$index] = $number + 1; # this is making changes to the origial array
    echo 'Inside of the array = ', $index, ': ', $number, '<br />'; # showing data from the copied array
}
echo '<hr />', '<pre>', print_r($numbers, true), '</pre>'; # shows the original values (also includes the newly added values).

Besides this, php does allow to use iterated values as a reference to the original array value as well. This is demonstrated below:

<?php
$numbers = [1,2,3,4,5,6,7,8,9];
echo '<pre>', print_r($numbers, true), '</pre>';
foreach($numbers as $index => &$number){
    ++$number; # we are incrementing the original value
    echo 'Inside of the array = ', $index, ': ', $number, '<br />'; # this is showing the original value
}
echo '<hr />';
echo '<pre>', print_r($numbers, true), '</pre>'; # we are again showing the original value

Note: It does not allow original array indexes to be used as references.

Source: http://dwellupper.io/post/47/understanding-php-foreach-loop-with-examples

Estriol answered 13/11, 2017 at 14:8 Comment(1)
Object public variables is wrong or at best misleading. You cannot use an object in an array without the correct interface (eg, Traversible) and when you do foreach((array)$obj ... you are in fact working with a simple array, not an object anymore.Osterhus

© 2022 - 2024 — McMap. All rights reserved.