Start python debugger in oldest stack frame after an exception occurs
Asked Answered
A

1

13

I use the --pdb command with ipython, so when I'm debugging code and an error occurs it shows a stack trace. A lot of these errors come from calling numpy or pandas functions with bad inputs. the stack trace starts at the newest frame, in code from these libraries. 5-10 repetitions of the up command later I can actually see what I did wrong, which will be immediately obvious 90% of the time (eg, calling with a list instead of an array).

Is there any way to specify which stack frame the debugger initially starts in? Either the oldest stack frame, or the newest stack frame in the python file initially run, or similar. This would be much more productive for debugging.

Here's a simple example

import pandas as pd

def test(df):  # (A)
    df[:,0] = 4 #Bad indexing on dataframe, will cause error
    return df

df = test(pd.DataFrame(range(3))) # (B)

Resulting traceback, (A), (B), (C) added for clarity

In [6]: ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-66730543fac0> in <module>()
----> 1 import codecs, os;__pyfile = codecs.open('''/tmp/py29142W1d''', encoding='''utf-8''');__code = __pyfile.read().encode('''utf-8''');__pyfile.close();os.remove('''/tmp/py29142W1d''');exec(compile(__code, '''/test/stack_frames.py''', 'exec'));

/test/stack_frames.py in <module>()
      6 
      7 if __name__ == '__main__':
(A)----> 8     df = test(pd.DataFrame(range(3)))

/test/stack_frames.py in test(df)
      2 
      3 def test(df):
(B)----> 4     df[:,0] = 4
      5     return df
      6 

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in __setitem__(self, key, value)
   2355         else:
   2356             # set column
-> 2357             self._set_item(key, value)
   2358 
   2359     def _setitem_slice(self, key, value):

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in _set_item(self, key, value)
   2421 
   2422         self._ensure_valid_index(value)
-> 2423         value = self._sanitize_column(key, value)
   2424         NDFrame._set_item(self, key, value)
   2425 

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in _sanitize_column(self, key, value)
   2602 
   2603         # broadcast across multiple columns if necessary
-> 2604         if key in self.columns and value.ndim == 1:
   2605             if (not self.columns.is_unique or
   2606                     isinstance(self.columns, MultiIndex)):

/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.pyc in __contains__(self, key)
   1232 
   1233     def __contains__(self, key):
-> 1234         hash(key)
   1235         # work around some kind of odd cython bug
   1236         try:

TypeError: unhashable type
> /usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py(1234)__contains__()
   1232 
   1233     def __contains__(self, key):
(C)-> 1234         hash(key)
   1235         # work around some kind of odd cython bug
   1236         try:

ipdb> 

Now ideally, I would like the debugger to start in the second oldest frame at (B), or even at (A). But definitely not at (C) where it goes by default.

Arther answered 4/11, 2016 at 18:55 Comment(1)
#37069823 may be related.Decato
U
5

Long answer to document the process for myself. Semi-working solution at the bottom:

Failed attempt here:

import sys
import pdb
import pandas as pd

def test(df):  # (A)
    df[:,0] = 4 #Bad indexing on dataframe, will cause error
    return df

mypdb = pdb.Pdb(skip=['pandas.*'])
mypdb.reset()

df = test(pd.DataFrame(range(3))) # (B) # fails.

mypdb.interaction(None, sys.last_traceback)  # doesn't work.

Pdb skip documentation:

The skip argument, if given, must be an iterable of glob-style module name patterns. The debugger will not step into frames that originate in a module that matches one of these patterns.

Pdb source code:

class Pdb(bdb.Bdb, cmd.Cmd):

    _previous_sigint_handler = None

    def __init__(self, completekey='tab', stdin=None, stdout=None, skip=None,
                 nosigint=False, readrc=True):
        bdb.Bdb.__init__(self, skip=skip)
        [...]

# Post-Mortem interface

def post_mortem(t=None):
    # handling the default
    if t is None:
        # sys.exc_info() returns (type, value, traceback) if an exception is
        # being handled, otherwise it returns None
        t = sys.exc_info()[2]
    if t is None:
        raise ValueError("A valid traceback must be passed if no "
                         "exception is being handled")

    p = Pdb()
    p.reset()
    p.interaction(None, t)

def pm():
    post_mortem(sys.last_traceback)

Bdb source code:

class Bdb:
    """Generic Python debugger base class.
    This class takes care of details of the trace facility;
    a derived class should implement user interaction.
    The standard debugger class (pdb.Pdb) is an example.
    """

    def __init__(self, skip=None):
        self.skip = set(skip) if skip else None
    [...]
    def is_skipped_module(self, module_name):
        for pattern in self.skip:
            if fnmatch.fnmatch(module_name, pattern):
                return True
        return False

    def stop_here(self, frame):
        # (CT) stopframe may now also be None, see dispatch_call.
        # (CT) the former test for None is therefore removed from here.
        if self.skip and \
               self.is_skipped_module(frame.f_globals.get('__name__')):
            return False
        if frame is self.stopframe:
            if self.stoplineno == -1:
                return False
            return frame.f_lineno >= self.stoplineno
        if not self.stopframe:
            return True
        return False

It is clear that the skip list is not used for post-mortems. To fix this I created a custom class which overrides the setup method.

import pdb

class SkipPdb(pdb.Pdb):
    def setup(self, f, tb):
        # This is unchanged
        self.forget()
        self.stack, self.curindex = self.get_stack(f, tb)
        while tb:
            # when setting up post-mortem debugging with a traceback, save all
            # the original line numbers to be displayed along the current line
            # numbers (which can be different, e.g. due to finally clauses)
            lineno = pdb.lasti2lineno(tb.tb_frame.f_code, tb.tb_lasti)
            self.tb_lineno[tb.tb_frame] = lineno
            tb = tb.tb_next

        self.curframe = self.stack[self.curindex][0]
        # This loop is new
        while self.is_skipped_module(self.curframe.f_globals.get('__name__')):
            self.curindex -= 1
            self.stack.pop()
            self.curframe = self.stack[self.curindex][0]
        # The rest is unchanged.
        # The f_locals dictionary is updated from the actual frame
        # locals whenever the .f_locals accessor is called, so we
        # cache it here to ensure that modifications are not overwritten.
        self.curframe_locals = self.curframe.f_locals
        return self.execRcLines()

    def pm(self):
        self.reset()
        self.interaction(None, sys.last_traceback)

If you use this as:

x = 42
df = test(pd.DataFrame(range(3))) # (B) # fails.
# fails. Then do:
mypdb = SkipPdb(skip=['pandas.*'])
mypdb.pm()
>> <ipython-input-36-e420cf1b80b2>(2)<module>()
>-> df = test(pd.DataFrame(range(3))) # (B) # fails.
> (Pdb) l
>  1    x = 42
>  2  ->    df = test(pd.DataFrame(range(3))) # (B) # fails.
> [EOF]

you are dropped into the right frame. Now you just need to figure out how ipython is calling their pdb pm/post_mortem function, and create a similar script. Which appears to be hard, so I pretty much give up here.

Also this is NOT a very great implementation. It assumes that the frames you want to skip are at the top of your stack, and will produce weird results else. E.g. an error in the input function to df.apply will produce something super weird.

TLDR: Not supported by the stdlib, but you can create your own debugger class, but it's nontrivial to get that working with IPythons debugger.

Utilitarian answered 14/11, 2016 at 14:8 Comment(2)
It looks like calling set_trace when setting the skip list is the correct syntax, e.g. import pdb; pdb.Pdb(skip=['django.*']).set_trace()Arther
That would Enter the debugger at the calling stack frame.. But we want to take the last stacktrace where it went wrong, and enter the debugger there.Utilitarian

© 2022 - 2024 — McMap. All rights reserved.