yield in list comprehensions and generator expressions
Asked Answered
W

1

82

The following behaviour seems rather counterintuitive to me (Python 3.4):

>>> [(yield i) for i in range(3)]
<generator object <listcomp> at 0x0245C148>
>>> list([(yield i) for i in range(3)])
[0, 1, 2]
>>> list((yield i) for i in range(3))
[0, None, 1, None, 2, None]

The intermediate values of the last line are actually not always None, they are whatever we send into the generator, equivalent (I guess) to the following generator:

def f():
   for i in range(3):
      yield (yield i)

It strikes me as funny that those three lines work at all. The Reference says that yield is only allowed in a function definition (though I may be reading it wrong and/or it may simply have been copied from the older version). The first two lines produce a SyntaxError in Python 2.7, but the third line doesn't.

Also, it seems odd

  • that a list comprehension returns a generator and not a list
  • and that the generator expression converted to a list and the corresponding list comprehension contain different values.

Could someone provide more information?

Wow answered 21/8, 2015 at 12:5 Comment(0)
D
82

Note: this was a bug in the CPython's handling of yield in comprehensions and generator expressions, fixed in Python 3.8, with a deprecation warning in Python 3.7. See the Python bug report and the What's New entries for Python 3.7 and Python 3.8.

Generator expressions, and set and dict comprehensions are compiled to (generator) function objects. In Python 3, list comprehensions get the same treatment; they are all, in essence, a new nested scope.

You can see this if you try to disassemble a generator expression:

>>> dis.dis(compile("(i for i in range(3))", '', 'exec'))
  1           0 LOAD_CONST               0 (<code object <genexpr> at 0x10f7530c0, file "", line 1>)
              3 LOAD_CONST               1 ('<genexpr>')
              6 MAKE_FUNCTION            0
              9 LOAD_NAME                0 (range)
             12 LOAD_CONST               2 (3)
             15 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             18 GET_ITER
             19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             22 POP_TOP
             23 LOAD_CONST               3 (None)
             26 RETURN_VALUE
>>> dis.dis(compile("(i for i in range(3))", '', 'exec').co_consts[0])
  1           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                11 (to 17)
              6 STORE_FAST               1 (i)
              9 LOAD_FAST                1 (i)
             12 YIELD_VALUE
             13 POP_TOP
             14 JUMP_ABSOLUTE            3
        >>   17 LOAD_CONST               0 (None)
             20 RETURN_VALUE

The above shows that a generator expression is compiled to a code object, loaded as a function (MAKE_FUNCTION creates the function object from the code object). The .co_consts[0] reference lets us see the code object generated for the expression, and it uses YIELD_VALUE just like a generator function would.

As such, the yield expression works in that context, as the compiler sees these as functions-in-disguise.

This is a bug; yield has no place in these expressions. The Python grammar before Python 3.7 allows it (which is why the code is compilable), but the yield expression specification shows that using yield here should not actually work:

The yield expression is only used when defining a generator function and thus can only be used in the body of a function definition.

This has been confirmed to be a bug in issue 10544. The resolution of the bug is that using yield and yield from will raise a SyntaxError in Python 3.8; in Python 3.7 it raises a DeprecationWarning to ensure code stops using this construct. You'll see the same warning in Python 2.7.15 and up if you use the -3 command line switch enabling Python 3 compatibility warnings.

The 3.7.0b1 warning looks like this; turning warnings into errors gives you a SyntaxError exception, like you would in 3.8:

>>> [(yield i) for i in range(3)]
<stdin>:1: DeprecationWarning: 'yield' inside list comprehension
<generator object <listcomp> at 0x1092ec7c8>
>>> import warnings
>>> warnings.simplefilter('error')
>>> [(yield i) for i in range(3)]
  File "<stdin>", line 1
SyntaxError: 'yield' inside list comprehension

The differences between how yield in a list comprehension and yield in a generator expression operate stem from the differences in how these two expressions are implemented. In Python 3 a list comprehension uses LIST_APPEND calls to add the top of the stack to the list being built, while a generator expression instead yields that value. Adding in (yield <expr>) just adds another YIELD_VALUE opcode to either:

>>> dis.dis(compile("[(yield i) for i in range(3)]", '', 'exec').co_consts[0])
  1           0 BUILD_LIST               0
              3 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                13 (to 22)
              9 STORE_FAST               1 (i)
             12 LOAD_FAST                1 (i)
             15 YIELD_VALUE
             16 LIST_APPEND              2
             19 JUMP_ABSOLUTE            6
        >>   22 RETURN_VALUE
>>> dis.dis(compile("((yield i) for i in range(3))", '', 'exec').co_consts[0])
  1           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                12 (to 18)
              6 STORE_FAST               1 (i)
              9 LOAD_FAST                1 (i)
             12 YIELD_VALUE
             13 YIELD_VALUE
             14 POP_TOP
             15 JUMP_ABSOLUTE            3
        >>   18 LOAD_CONST               0 (None)
             21 RETURN_VALUE

The YIELD_VALUE opcode at bytecode indexes 15 and 12 respectively is extra, a cuckoo in the nest. So for the list-comprehension-turned-generator you have 1 yield producing the top of the stack each time (replacing the top of the stack with the yield return value), and for the generator expression variant you yield the top of the stack (the integer) and then yield again, but now the stack contains the return value of the yield and you get None that second time.

For the list comprehension then, the intended list object output is still returned, but Python 3 sees this as a generator so the return value is instead attached to the StopIteration exception as the value attribute:

>>> from itertools import islice
>>> listgen = [(yield i) for i in range(3)]
>>> list(islice(listgen, 3))  # avoid exhausting the generator
[0, 1, 2]
>>> try:
...     next(listgen)
... except StopIteration as si:
...     print(si.value)
... 
[None, None, None]

Those None objects are the return values from the yield expressions.

And to reiterate this again; this same issue applies to dictionary and set comprehension in Python 2 and Python 3 as well; in Python 2 the yield return values are still added to the intended dictionary or set object, and the return value is 'yielded' last instead of attached to the StopIteration exception:

>>> list({(yield k): (yield v) for k, v in {'foo': 'bar', 'spam': 'eggs'}.items()})
['bar', 'foo', 'eggs', 'spam', {None: None}]
>>> list({(yield i) for i in range(3)})
[0, 1, 2, set([None])]
Dib answered 21/8, 2015 at 12:8 Comment(11)
Note that according to the language specification the yield-atom is allowed inside an expression (inside a generator function). This could be even more problematic if the yield-atom is somehow misimplemented.Frechette
@skyking: that's what I'm saying; the grammar allows it. The bug I refer to is trying to use a yield as part of a generator expression inside a generator function, where the expectation is that the yield applies to the generator function, not the generator expression nested scope.Dib
Wow. Very informative indeed. So, if I understood correctly, the following happened: a function that contains both yield and return should, as is documented, become a generator function whose returned value should land in the StopIteration exception, and the bytecode for a list comprehension with yield inside looks (although it was not intended) just like the bytecode of such a function.Wow
@zabolekar: something like that; the steps are something like: the compiler comes across a list comprehension so builds a code object; the compiler comes across a yield expression so marks the current code object as a generator. Voila, we have a generator function.Dib
and your last comment also explains the other oddity here: [(yield i) for i in range(3)] returns a <generator object at ....> instead of a list. Quirky.Canaliculus
This answer could do with an update, see the more recent discussion on the bug report bugs.python.org/issue10544; this syntax will sometimes raise a DeprecationWarning in Py 3.7 and then a SyntaxError from Py 3.8 docs.python.org/3.7/whatsnew/3.7.html#deprecated I was going to post something myself but I think Martijn would do a much better job explaining all this!Yowl
Best quote from the bug report: "I think we all need to calm down a bit. How about not posting about this topic for 24 hours." - GuidoYowl
@Chris_Rands: thanks for the heads-up! I noticed that the implementation of the deprecation warning was accidentally committed to the 2.7 branch though.Dib
It's a mistake rather than backported is it? Also seems that even for Py 3.7+ yield is still allowed in the outer iterator for [func(x, y) for x in <outermost iterator> for y in <something else>] Guido saysYowl
@Yowl no, I misread the commit info attached to the bug report; I forgot 3.7 was still the master branch when the first patch landed.Dib
@Yowl the 2.7 changes are there for when you use the -3 compatibility warnings.Dib

© 2022 - 2024 — McMap. All rights reserved.