Generator expression uses list assigned after the generator's creation
Asked Answered
G

5

25

I found this example and I can't understand why it works unpredictably? I supposed it must output [1, 8, 15] or [2, 8, 22].

array = [1, 8, 15]
g = (x for x in array if array.count(x) > 0)
array = [2, 8, 22]
print(list(g))
# >>> [8]
Grandmamma answered 24/10, 2018 at 11:52 Comment(1)
aside: if array.count(x) > 0 => x in array is smarter & faster :)Different
S
26

The reason is that, at creation time, the generator (a for b in c if d) only evaluates c (which sometimes makes b predictable as well). But a, b, d are evaluated at consumption time (at each iteration). Here, it uses the current binding of array from the enclosing scope when evaluating d (array.count(x) > 0).

You can for instance do:

g = (x for x in [] if a)

Without having declared a in advance. But, you have to make sure a exists when the generator is consumed.

But you cannot do similarly:

g = (x for x in a if True)

Upon request:

You can observe similar (however not identical) patterns with a common generator function:

def yielder():
    for x in array:
        if array.count(x) > 0:
            yield x

array = [1, 8, 15]
y = yielder()
array = [2, 8, 22]
list(y)
# [2, 8, 22]

The generator function does not execute any of its body ahead of consumption. Hence, even the array in the for-loop header is bound late. An even more disturbing example occurs where we "switch out" array during iteration:

array = [1, 8, 15]
y = yielder()
next(y)
# 1
array = [3, 7]
next(y)  # still iterating [1, 8, 15], but evaluating condition on [3, 7]
# StopIteration raised
Solenne answered 24/10, 2018 at 11:58 Comment(5)
Can you explain why the generator expression seems to behave differently to the generator function def yielder(): \ for x in array: \ if array.count(x) > 0: \ yield x. Using list(yielder) exhausts so you get [1, 8, 15], while list(g) only gives [8].Clergyman
@Clergyman You cannot call list on a function object. But nitpicking aside =) I added some explanation to that end.Solenne
Thank you, very helpful. Of course list(yielder()) is what I meant :)Clergyman
"And since a generator does not open its own namespace" - yes it does. That's why the loop variables don't leak into the outer scope. What it doesn't do is eagerly copy the bindings from the namespace where it was created; it looks up closure variables upon use.Mckown
@Mckown Thx for commenting. I updated that section. According to most of the documentation I found on closures in Python, I am not sure if the generator expression really contains closures in the tight sense as there is no nested function.Solenne
B
10

From the docs on Generator expressions:

Variables used in the generator expression are evaluated lazily when the __next__() method is called for the generator object (in the same fashion as normal generators). However, the iterable expression in the leftmost for clause is immediately evaluated, so that an error produced by it will be emitted at the point where the generator expression is defined, rather than at the point where the first value is retrieved.

So when you run

array = [1, 8, 15]
g = (x for x in array if array.count(x) > 0)

only the first array in the generator expression is evaluated. x and array.count(x) will only be evaluated when you call next(g). Since you make array point to another list [2, 8, 22] before consuming the generator you get the 'unexpected' result.

array = [2, 8, 22]
print(list(g))  # [8]
Bract answered 24/10, 2018 at 12:14 Comment(0)
M
1

when you first create the array and assign the elements in it, elements of the array points to some memory location and generator keeps that location (not the array's) for its execution.

but when you modify its elements of the array it gets changed but as '8' is common for both of them python does not reassign it and points to the same element after modification.

Look the below example for better understanding

array = [1, 8, 15]
for i in array:
    print(id(i))

g = (x for x in array if array.count(x) > 0)

print('<======>')

array = [2, 8, 22]
for i in array:
    print(id(i))

print(array)
print(list(g))

Output

140208067495680
140208067495904
140208067496128
<======>
140208067495712
140208067495904 # memory location is still same
140208067496352
[2, 8, 22]
[8]
Mimesis answered 24/10, 2018 at 12:15 Comment(0)
D
0

Actually, it is not really crazy if you look more carefully. look at

g = (x for x in array if array.count(x) > 0)

it will create a generator that looks through the array and will search if the count of already existing values is more than zero. so your generator only looks for 1, 8 and 15, and when you change the values to another, the generator just looks for the previous values again not new ones. because it(generator) creates when array had them.

so if you put thousands of values in the array it only looks for those three only.

Dougdougal answered 24/10, 2018 at 12:0 Comment(1)
It is not clear to me whether this answer says that the condition or the array is instantly evaluatedMond
K
0

The confusion, and so is the answer, lies in the line:g = (x for x in array if array.count(x) > 0)
If we simplify this line then it will become: g = (x for x in array1 if array2.count(x) > 0)

Now, when generator is created then it keeps the reference of array1 object. So even if I will change the value of array1 to any other value (i.e. set it to a new array object), it will not affect the generator's copy of array1. Because only array1 is changing it's object reference. But array2 is checked dynamically. So if we change its value it will be reflected.

You can see outputy from following code to understand it batter. See it working online here:

array1 = [1, 8, 15] #Set value of `array1`
array2 = [2, 3, 4, 5, 8] #Set value of `array2`
print("Old `array1` object ID: " + repr(id(array1)))
print("Old `array2` object ID: " + repr(id(array2)))
g = (x for x in array1 if array2.count(x) > 0)
array1 = [0, 9] #Changed value of `array1`
array2 = [2, 8, 22, 1] #Changed value of `array2`
print("New `array1` object ID: " + repr(id(array1)))
print("New `array2` object ID: " + repr(id(array2)))
print(list(g))

Output:

Old `array1` object ID: 47770072262024
Old `array2` object ID: 47770072263816
New `array1` object ID: 47770072263944
New `array2` object ID: 47770072264008
[1, 8]
Kopaz answered 24/10, 2018 at 12:18 Comment(1)
The way you're using the word "copy" here is pretty misleading. The generator expression doesn't copy anything. It simply holds a reference to the original value of array.Marrow

© 2022 - 2024 — McMap. All rights reserved.