yield from vs yield in for-loop
Asked Answered
I

2

32

My understanding of yield from is that it is similar to yielding every item from an iterable. Yet, I observe the different behavior in the following example.

I have Class1

class Class1:
    def __init__(self, gen):
        self.gen = gen
        
    def __iter__(self):
        for el in self.gen:
            yield el

and Class2 that different only in replacing yield in for loop with yield from

class Class2:
    def __init__(self, gen):
        self.gen = gen
        
    def __iter__(self):
        yield from self.gen

The code below reads the first element from an instance of a given class and then reads the rest in a for loop:

a = Class1((i for i in range(3)))
print(next(iter(a)))
for el in iter(a):
    print(el)

This produces different outputs for Class1 and Class2. For Class1 the output is

0
1
2

and for Class2 the output is

0

Live demo

What is the mechanism behind yield from that produces different behavior?

Imogen answered 26/12, 2022 at 16:48 Comment(4)
Not specifically an answer to your question, but https://mcmap.net/q/63515/-in-practice-what-are-the-main-uses-for-the-quot-yield-from-quot-syntax-in-python-3-3 provides more ways in which yield from is different from a loop over yield.Pentheas
Very weirdly enough, with Class2, if you extract iter(a) into a variable (b = iter(a); print(next(b))), this will work the same as Class1, i.e. prints all the numbers. That's confusing and very interesting.Mcchesney
Yep, and if you do del b, it only prints the first one @YevhenKuzmovychImogen
@YevhenKuzmovych There are too many bogus issues already, better ask at discuss "Python Help" instead.Parcae
M
27

What Happened?

When you use next(iter(instance_of_Class2)), iter() calls .close() on the inner generator when it (the iterator, not the generator!) goes out of scope (and is deleted), while with Class1, iter() only closes its instance

>>> g = (i for i in range(3))
>>> b = Class2(g)
>>> i = iter(b)     # hold iterator open
>>> next(i)
0
>>> next(i)
1
>>> del(i)          # closes g
>>> next(iter(b))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

This behavior is described in PEP 342 in two parts

What happens is a little clearer (if perhaps surprising) when multiple generator delegations occur; only the generator being delegated is closed when its wrapping iter is deleted

>>> g1 = (a for a in range(10))
>>> g2 = (a for a in range(10, 20))
>>> def test3():
...     yield from g1
...     yield from g2
... 
>>> next(test3())
0
>>> next(test3())
10
>>> next(test3())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Fixing Class2

What options are there to make Class2 behave more the way you expect?

Notably, other strategies, though they don't have the visually pleasing sugar of yield from or some of its potential benefits gives you a way to interact with the values, which seems like a primary benefit

  • avoid creating a structure like this at all ("just don't do that!")
    if you don't interact with the generator and don't intend to keep a reference to the iterator, why bother wrapping it at all? (see above comment about interacting)
  • create the iterator yourself internally (this may be what you expected)
    >>> class Class3:
    ...     def __init__(self, gen):
    ...         self.iterator = iter(gen)
    ...         
    ...     def __iter__(self):
    ...         return self.iterator
    ... 
    >>> c = Class3((i for i in range(3)))
    >>> next(iter(c))
    0
    >>> next(iter(c))
    1
    
  • make the whole class a "proper" Generator
    while testing this, it plausibly highlights some iter() inconsistency - see comments below (ie. why isn't e closed?)
    also an opportunity to pass multiple generators with itertools.chain.from_iterable
    >>> class Class5(collections.abc.Generator):
    ...     def __init__(self, gen):
    ...         self.gen = gen
    ...     def send(self, value):
    ...         return next(self.gen)
    ...     def throw(self, value):
    ...         raise StopIteration
    ...     def close(self):          # optional, but more complete
    ...         self.gen.close()
    ... 
    >>> e = Class5((i for i in range(10)))
    >>> next(e)        # NOTE iter is not necessary!
    0
    >>> next(e)
    1
    >>> next(iter(e))  # but still works
    2
    >>> next(iter(e))  # doesn't close e?? (should it?)
    3
    >>> e.close()
    >>> next(e)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3.9/_collections_abc.py", line 330, in __next__
        return self.send(None)
      File "<stdin>", line 5, in send
    StopIteration
    

Hunting the Mystery

A better clue is that if you directly try again, next(iter(instance)) raises StopIteration, indicating the generator is permanently closed (either through exhaustion or .close()), and why iterating over it with a for loop yields no more values

>>> a = Class1((i for i in range(3)))
>>> next(iter(a))
0
>>> next(iter(a))
1
>>> b = Class2((i for i in range(3)))
>>> next(iter(b))
0
>>> next(iter(b))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

However, if we name the iterator, it works as expected

>>> b = Class2((i for i in range(3)))
>>> i = iter(b)
>>> next(i)
0
>>> next(i)
1
>>> j = iter(b)
>>> next(j)
2
>>> next(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

To me, this suggests that when the iterator doesn't have a name, it calls .close() when it goes out of scope

>>> def gen_test(iterable):
...     yield from iterable
... 
>>> g = gen_test((i for i in range(3)))
>>> next(iter(g))
0
>>> g.close()
>>> next(iter(g))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Disassembling the result, we find the internals are a little different

>>> a = Class1((i for i in range(3)))
>>> dis.dis(a.__iter__)
  6           0 LOAD_FAST                0 (self)
              2 LOAD_ATTR                0 (gen)
              4 GET_ITER
        >>    6 FOR_ITER                10 (to 18)
              8 STORE_FAST               1 (el)

  7          10 LOAD_FAST                1 (el)
             12 YIELD_VALUE
             14 POP_TOP
             16 JUMP_ABSOLUTE            6
        >>   18 LOAD_CONST               0 (None)
             20 RETURN_VALUE
>>> b = Class2((i for i in range(3)))
>>> dis.dis(b.__iter__)
  6           0 LOAD_FAST                0 (self)
              2 LOAD_ATTR                0 (gen)
              4 GET_YIELD_FROM_ITER
              6 LOAD_CONST               0 (None)
              8 
             10 POP_TOP
             12 LOAD_CONST               0 (None)
             14 RETURN_VALUE

Notably, the yield from version has GET_YIELD_FROM_ITER

If TOS is a generator iterator or coroutine object it is left as is. Otherwise, implements TOS = iter(TOS).

(subtly, YIELD_FROM keyword appears to be removed in 3.11)

So if the given iterable (to the class) is a generator iterator, it'll be handed off directly, giving the result we (might) expect


Extras

Passing an iterator which isn't a generator (iter() creates a new iterator each time in both cases)

>>> a = Class1([i for i in range(3)])
>>> next(iter(a))
0
>>> next(iter(a))
0
>>> b = Class2([i for i in range(3)])
>>> next(iter(b))
0
>>> next(iter(b))
0

Expressly closing Class1's internal generator

>>> g = (i for i in range(3))
>>> a = Class1(g)
>>> next(iter(a))
0
>>> next(iter(a))
1
>>> a.gen.close()
>>> next(iter(a))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

generator is only closed by iter when deleted if instance is popped

>>> g = (i for i in range(10))
>>> b = Class2(g)
>>> i = iter(b)
>>> next(i)
0
>>> j = iter(b)
>>> del(j)        # next() not called on j
>>> next(i)
1
>>> j = iter(b)
>>> next(j)
2
>>> del(j)        # generator closed
>>> next(i)       # now fails, despite range(10) above
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
Marillin answered 26/12, 2022 at 19:38 Comment(6)
You can also look at the PEP, you can see the close call there.Parcae
That's explanatory, but still fairly opaque as to why! .. but from that, PEP 342 looks like it brings both the new close method and sneaks in 5. Add support to ensure that close() is called when a generator iterator is garbage-collected.Marillin
@Marillin Why is the generator iterator garbage collected if it’s still referenced by self.gen?Imogen
it's not that there's no leftover reference (there is, which is how it raises StopIteration rather than AttributeError or NameError, etc.), rather it happens when iter()' is collected and explicitly calls .close() on the generator it's wrappingMarillin
This is just my subjective opinion, but this feels like a bug even though it's in accordance with the spec. If the spec mandates this then I would say the spec mandates the wrong behaviour. That said, I am not 100% sure that the spec does mandate this behaviour, because it depends on when an object is garbage-collected, and as far as I know the spec doesn't say exactly when that should happen. It should be consistent with the spec to delay garbage-collection until the generator is exhausted in the normal way.Themis
it's hard for me to pick a side - honestly, I'm mostly in the "don't do this" camp, even if this turns out to be a decade-old bug.. the behavior should almost-certainly be more consistent here, but perhaps iter(generator) should raise RuntimeError and/or yield from (or simply yield) should raise SyntaxError in some reserved dunder methods, forcing return and preferring next() (or await) internally! PEP 525 for Asynchronous Generators also hints usefulness imo by not even implementing delegation and suggesting async for peps.python.org/pep-0525/#asynchronous-yield-fromMarillin
G
2

updated

I don't see it as that complicated, and the resulting behavior can be seen as actually unsurprising.

When the iterator goes out of scope, Python will throw a "GeneratorExit" exception in the (innermost) generator.

On the "classic" for form, the exception happens in the user-written __iter__ method, is not catch, and is suppressed when bubbling up by the generator mechanisms.

On the yield from form, the same exception is thrown in the inner self.gen, thus "killing" it, and bubbles up to the user-written __iter__ .

Writing another intermediate generator can make this easily visible:


def inner_gen(gen):
    try:
        for item in gen:
            yield item
    except GeneratorExit:
        print("Generator exit thrown in inner generator")

class Class1:
    def __init__(self, gen):
        self.gen = inner_gen(gen)
        
    def __iter__(self):
        try:
            for el in self.gen:
                yield el
        except GeneratorExit:
            print("Generator exit thrown in outer generator for 'classic' form")
            
    
class Class2(Class1):
    def __iter__(self):
        try:
            yield from self.gen
        except GeneratorExit as exit:
            print("Generator exit thrown in outer generator for 'yield from' form" )
        
first = lambda g:next(iter(g))

And now:

In [324]: c1 = Class1((i for i in range(2)))

In [325]: first(c1)
Generator exit thrown in outer generator for 'classic' form
Out[325]: 0

In [326]: first(c1)
Generator exit thrown in outer generator for 'classic' form
Out[326]: 1

In [327]: c2 = Class2((i for i in range(2)))

In [328]: first(c2)
Generator exit thrown in inner generator
Generator exit thrown in outter generator for 'yield from' form
Out[328]: 0

In [329]: first(c2)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[329], line 1
(...)

StopIteration: 


update I had a previous answer text speculating how the call to close would take place, skipping the intermediate generator - it is not that simple regarding close though: Python will always call __del__ - not close, which is only called by the user, or in certain circunstances that were hard to pin down. But it will always throw the GeneratorExit exception in a generator-function body (not in a class with explict __next__ and throw , though - let's skip this for another question :-D )

Graff answered 3/1, 2023 at 15:51 Comment(5)
Excellent simple explanation, thank you for that!Pentheas
despite my answer here being text only, I did made some tests on an interactive environment. I will replicate then and paste the snippets here. (it is possible that I did not explicitly testes for this bypassing but I will do so now).Graff
Testing this stuff correctly is even more intricate :-) and it does not seem to "fit in mind all at once" - but the final finding can be even simpler: Python throws a GeneratorExit in the innermost generator when it goes out of scope and let it propagate. In the "for" form, this is in the user-written __iter__ method,Graff
@Pentheas : it ended up being even simpler, but I had to completely rewrite it.Graff
I've added the code I used to check calls for __del__ and __close__ in each case here: gist.github.com/jsbueno/e4378521ead8f9dbb40565fb5cacd0b9Graff

© 2022 - 2024 — McMap. All rights reserved.