Possible bug in pdb module in Python 3 when using list generators
Asked Answered
P

3

26

After running this code in Python 3:

import pdb

def foo():
    nums = [1, 2, 3]
    a = 5
    pdb.set_trace()

foo()

The following expressions work:

(Pdb) print(nums)
[1, 2, 3]

(Pdb) print(a)
5

(Pdb) [x for x in nums]
[1, 2, 3]

but the following expression fails:

(Pdb) [x*a for x in nums]
*** NameError: global name 'a' is not defined

The above works fine in Python 2.7.

Is this a bug or I am missing something?

Update: See the new accepted answer. This was indeed a bug (or a problematic design) which has been addressed now by introducing a new command and mode in pdb.

Plaque answered 25/6, 2013 at 6:13 Comment(4)
strange this does work for me with ipdb==0.7 and ipython==0.13.2Idiographic
It failed in IPython3 0.12.1 and Python 3.2.3.Plaque
to pop this tip up: try interactive pdb modeBlocked
Bizarrely this also fails in Python 2.7Mammoth
B
27

if you type interact in your [i]pdb session, you get an interactive session, and list comprehensions do work as expected in this mode

source: http://bugs.python.org/msg215963

Blocked answered 3/6, 2016 at 22:48 Comment(1)
How do I exit interactive mode?Confectioner
T
11

It works perfectly fine:

>>> import pdb
>>> def f(seq):
...     pdb.set_trace()
... 
>>> f([1,2,3])
--Return--
> <stdin>(2)f()->None
(Pdb) [x for x in seq]
[1, 2, 3]
(Pdb) [x in seq for x in seq]
[True, True, True]

Without showing what you are actually doing nobody can tell you why in your specific case you got a NameError.


TL;DR In python3 list-comprehensions are actually functions with their own stack frame, and you cannot access the seq variable, which is an argument of test, from inner stack frames. It is instead treated as a global (and, hence, not found).


What you see is the different implementation of list-comprehension in python2 vs python3. In python 2 list-comprehensions are actually a short-hand for the for loop, and you can clearly see this in the bytecode:

>>> def test(): [x in seq for x in seq]
... 
>>> dis.dis(test)
  1           0 BUILD_LIST               0
              3 LOAD_GLOBAL              0 (seq)
              6 GET_ITER            
        >>    7 FOR_ITER                18 (to 28)
             10 STORE_FAST               0 (x)
             13 LOAD_FAST                0 (x)
             16 LOAD_GLOBAL              0 (seq)
             19 COMPARE_OP               6 (in)
             22 LIST_APPEND              2
             25 JUMP_ABSOLUTE            7
        >>   28 POP_TOP             
             29 LOAD_CONST               0 (None)
             32 RETURN_VALUE        

Note how the bytecode contains a FOR_ITER loop. On the other hand, in python3 list-comprehension are actually functions with their own stack frame:

>>> def test(): [x in seq2 for x in seq]
... 
>>> dis.dis(test)
  1           0 LOAD_CONST               1 (<code object <listcomp> at 0xb6fef160, file "<stdin>", line 1>) 
              3 MAKE_FUNCTION            0 
              6 LOAD_GLOBAL              0 (seq) 
              9 GET_ITER             
             10 CALL_FUNCTION            1 
             13 POP_TOP              
             14 LOAD_CONST               0 (None) 
             17 RETURN_VALUE      

As you can see there is no FOR_ITER here, instead there is a MAKE_FUNCTION and CALL_FUNCTION bytecodes. If we examine the code of the list-comprehension we can understand how the bindings are setup:

>>> test.__code__.co_consts[1]
<code object <listcomp> at 0xb6fef160, file "<stdin>", line 1>
>>> test.__code__.co_consts[1].co_argcount   # it has one argument
1
>>> test.__code__.co_consts[1].co_names      # global variables
('seq2',)
>>> test.__code__.co_consts[1].co_varnames   # local variables
('.0', 'x')

Here .0 is the only argument of the function. x is the local variable of the loop and seq2 is a global variable. Note that .0, the list-comprehension argument, is the iterable obtained from seq, not seq itself. (see the GET_ITER opcode in the output of dis above). This is more clear with a more complex example:

>>> def test():
...     [x in seq for x in zip(seq, a)]
... 
>>> dis.dis(test)
  2           0 LOAD_CONST               1 (<code object <listcomp> at 0xb7196f70, file "<stdin>", line 2>) 
              3 MAKE_FUNCTION            0 
              6 LOAD_GLOBAL              0 (zip) 
              9 LOAD_GLOBAL              1 (seq) 
             12 LOAD_GLOBAL              2 (a) 
             15 CALL_FUNCTION            2 
             18 GET_ITER             
             19 CALL_FUNCTION            1 
             22 POP_TOP              
             23 LOAD_CONST               0 (None) 
             26 RETURN_VALUE 
>>> test.__code__.co_consts[1].co_varnames
('.0', 'x')

Here you can see that the only argument to the list-comprehension, always denoted by .0, is the iterable obtained from zip(seq, a). seq and a themselves are not passed to the list-comprehension. Only iter(zip(seq, a)) is passed inside the list-comprehension.

An other observation that we must make is that, when you run pdb, you cannot access the context of the current function from the functions you want to define. For example the following code fails both on python2 and python3:

>>> import pdb
>>> def test(seq): pdb.set_trace()
... 
>>> test([1,2,3])
--Return--
> <stdin>(1)test()->None
(Pdb) def test2(): print(seq)
(Pdb) test2()
*** NameError: global name 'seq' is not defined

It fails because when defining test2 the seq variable is treated as a global variable, but it's actually a local variable inside the test function, hence it isn't accessible.

The behaviour you see is similar to the following scenario:

#python 2 no error
>>> class A(object):
...     x = 1
...     L = [x for _ in range(3)]
... 
>>> 

#python3 error!
>>> class A(object):
...     x = 1
...     L = [x for _ in range(3)]
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in A
  File "<stdin>", line 3, in <listcomp>
NameError: global name 'x' is not defined

The first one doesn't give an error because it is mostly equivalent to:

>>> class A(object):
...     x = 1
...     L = []
...     for _ in range(3): L.append(x)
... 

Since the list-comprehension is "expanded" in the bytecode. In python3 it fails because you are actually defining a function and you cannot access the class scope from a nested function scope:

>>> class A(object):
...     x = 1
...     def test():
...             print(x)
...     test()
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in A
  File "<stdin>", line 4, in test
NameError: global name 'x' is not defined

Note that genexp are implemented as functions on python2, and in fact you see a similar behaviour with them(both on python2 and python3):

>>> import pdb
>>> def test(seq): pdb.set_trace()
... 
>>> test([1,2,3])
--Return--
> <stdin>(1)test()->None
(Pdb) list(x in seq for x in seq)
*** Error in argument: '(x in seq for x in seq)'

Here pdb doesn't give you more details, but the failure happens for the same exact reason.


In conclusion: it's not a bug in pdb but the way python implements scopes. AFAIK changing this to allow what you are trying to do in pdb would require some big changes in how functions are treated and I don't know whether this can be done without modifying the interpreter.


Note that when using nested list-comprehensions, the nested loop is expanded in bytecode like the list-comprehensions in python2:

>>> import dis
>>> def test(): [x + y for x in seq1 for y in seq2]
... 
>>> dis.dis(test)
  1           0 LOAD_CONST               1 (<code object <listcomp> at 0xb71bf5c0, file "<stdin>", line 1>) 
              3 MAKE_FUNCTION            0 
              6 LOAD_GLOBAL              0 (seq1) 
              9 GET_ITER             
             10 CALL_FUNCTION            1 
             13 POP_TOP              
             14 LOAD_CONST               0 (None) 
             17 RETURN_VALUE         
>>> # The only argument to the listcomp is seq1
>>> import types
>>> func = types.FunctionType(test.__code__.co_consts[1], globals())
>>> dis.dis(func)
  1           0 BUILD_LIST               0 
              3 LOAD_FAST                0 (.0) 
        >>    6 FOR_ITER                29 (to 38) 
              9 STORE_FAST               1 (x) 
             12 LOAD_GLOBAL              0 (seq2) 
             15 GET_ITER             
        >>   16 FOR_ITER                16 (to 35) 
             19 STORE_FAST               2 (y) 
             22 LOAD_FAST                1 (x) 
             25 LOAD_FAST                2 (y) 
             28 BINARY_ADD           
             29 LIST_APPEND              3 
             32 JUMP_ABSOLUTE           16 
        >>   35 JUMP_ABSOLUTE            6 
        >>   38 RETURN_VALUE        

As you can see, the bytecode for listcomp has an explicit FOR_ITER over seq2. This explicit FOR_ITER is inside the listcomp function, and thus the restrictions on scopes still apply(e.g. seq2 is loaded as a global).

And in fact we can confirm this using pdb:

>>> import pdb
>>> def test(seq1, seq2): pdb.set_trace()
... 
>>> test([1,2,3], [4,5,6])
--Return--
> <stdin>(1)test()->None
(Pdb) [x + y for x in seq1 for y in seq2]
*** NameError: global name 'seq2' is not defined
(Pdb) [x + y for x in non_existent for y in seq2]
*** NameError: name 'non_existent' is not defined

Note how the NameError is about seq2 and not seq1(which is passed as function argument), and note how changing the first iterable name to something that doesn't exist changes the NameError(which means that in the first case seq1 was passed successfully).

Tuberculin answered 25/6, 2013 at 6:34 Comment(14)
Indeed it works in Python 2.7.3 (default, Aug 1 2012, 05:14:39) [GCC 4.6.3] but it fails in Python 3.2.3 (default, Oct 19 2012, 20:10:41) [GCC 4.6.3]. Hope this is detailed enough. I will add these details to the question.Plaque
@Plaque Updated my answer. The different behaviour is due to scoping issue with how list-comprehensions are implemented in python3.Tuberculin
@Baruriu Thanks for the answer. It certainly shed a lot of light on the problem. I edited the question and added what I think is a better example of the situation. Having read your answer, there are two follow-up questions: 1) In the example given in the updated question both 'nums' and 'a' are local variables. It seems that only local variables that appear before 'for' in a list comprehension in pdb are problematic. Does this mean that 'nums' is passed to the list comprehension function as an argument but 'a' is considered global?Plaque
and 2) Python supports closures, would it not be possible for pdb to pass a reference to the current stackframe (assuming that it is not really in the stack) so that the defined function can look up non-local variables from that frame?Plaque
@Plaque Regarding your first point I believe python sees that you are iterating over nums and hence it makes it the parameter of the list-comprehension. In my answer I also show that, internally, the name of this argument is .0(which, purposefully, isn't a normal python identifier and is used to avoid messing with other variables). Since it is passed as an argument there are no problems referencing it for iteration. Other occurences of any variable are all considered globals(even if you use the same identifier). About 2 I have no idea, but I think it's not that easy to implement.Tuberculin
Actually, if my first hypothesis was correct then the function generated for [x + y for x in seq1 for y in seq2] should have two parameters but (using your code to inspect) it still has one parameter!Plaque
@Plaque No, the only argument to the list-comprehension is the first iterable. Nested loops are expanded in a manner similar to python2's list-comprehensions, and nested iterables are loaded as globals. See the last updated at the end of my answer.Tuberculin
You are right and for that reason [x + y for x in seq1 for y in seq2] fails for not being able to find global seq2. Thanks for the answer. I find this a very strange way of implementing list comprehensions though; wonder what's the motivation. I personally don't see it as a feature if it is not a bug.Plaque
@Plaque The problem with python2's implementation is that local variables leak in the outer scope. E.g. x = 1; [x for x in range(10)]; print(x) prints 9 in python2 and 1 in python3. I think the last one is more sensible, and hence they decided to give list-comprehensions their own scope. To provide a scope it was decided to do it the easy way: building a function(hence all the behaviour you see).Tuberculin
Sure, the function part makes sense but why in [x+y for x in seq1 for y in seq2] seq1 is a parameter and seq2 is non-local. It could have been a function with no parameter, x and y would have remained local and therefore scope leaking was still solved and then both seq1 and seq2 could have been non-local which to me (with my limited knowledge) seems a more consistent solution.Plaque
@Plaque I believe the problem is that seq2 could be dependent on seq1, like in: [x + y for x in range(10) for y in range(x)]. If it was an argument to the list-comprehension it wouldn't be possible to execute that list-comprehension. Also keep in mind that the full syntax of list-comprehensions is [expression(X, Y, ...) for X in iterable if condition(X) for Y in iterable(X) if condition(X, Y) ... ]. This make it even harder to pass the nested iterators as arguments. Think of doing that with: [y for x in range(10) if x % 2 != 0 for y in range(x) if y % 2 == 0].Tuberculin
I was actually arguing for passing no arguments at all; that is, having x, y ... as local so they don't leak and then having seq1, seq2 ... as non-local or global. Of course if seq2 can be computed as a function of seq1 or x then it does not need to be global anymore it will be an expression using only local variables.Plaque
@Plaque Performance may be a valid reason. Most list-comprehensions are simple(e.g. they use a single for and only local variables in the expression/condition), hence using a global for the iterable would add a lot of overhead(looking for a global is much slower than a local in CPython). Anyway you should ask to some developer to have real answers about this(maybe try at comp.lang.python google group?)Tuberculin
It's a shame this "feature" makes pdb almost unusable in Python 3.Obidiah
F
-5

I just can't understand why you would need to do the above if you are looking to produce a list of Trues for each element in seq then why not [True for x in seq] - I would guess that you need to assign a local copy first before trying this sort of thing.

Forbore answered 25/6, 2013 at 6:26 Comment(1)
I am trying to understand why an expression that looks perfectly fine to me fails in pdb. The given code example does not have any purpose other than helping understand what is going on.Plaque

© 2022 - 2024 — McMap. All rights reserved.