Getting better debug when Linux crashes in a C programme
Asked Answered
D

2

7

We have an embedded version of Linux kernel running on a MIPs core. The Programme we have written runs a particular test suite. During one of the stress tests (runs for about 12hrs) we get a seg fault. This in turn generates a core dump.

Unfortunately the core dump is not very useful. The crash is in some system library that is dynamically linked (probably pthread or glibc). The backtrace in the core dump is not helpful because it only shows the crash point and no other callers (our user space app is built with -g -O0, but still no back trace info):

Cannot access memory at address 0x2aab1004
(gdb) bt
#0  0x2ab05d18 in ?? ()
warning: GDB can't find the start of the function at 0x2ab05d18.

    GDB is unable to find the start of the function at 0x2ab05d18
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
    This problem is most likely caused by an invalid program counter or
stack pointer.
    However, if you think GDB should simply search farther back
from 0x2ab05d18 for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.

Another unfortunate-ness is that we cannot run gdb/gdbserver. gdb/gdbserver keeps breaking on __nptl_create_event. Seeing that the test creates threads, timers and destroys then every 5s it is almost impossible to sit for a long time hitting continue on them.

EDIT: Another note, backtrace and backtrace_symbols is not supported on our toolchain.

Hence:

  1. Is there a way of trapping seg fault and generate more backtrace data, stack pointers, call stack, etc.?

  2. Is there a way of getting more data from a core dump that crashed in a .so file?

Thanks.

Defoliate answered 24/11, 2011 at 4:46 Comment(1)
You could try handling SIGSEGV if that's possible? It's never recommended, but I feel that could help you in this situation.Incubation
A
2

GDB can't find the start of the function at 0x2ab05d18

What is at that address at the time of the crash?

Do info shared, and find out if there is a library that contains that address.

The most likely cause of your troubles: did you run strip libpthread.so.0 before uploading it to your target? Don't do that: GDB requires libpthread.so.0 to not be stripped. If your toolchain contains libpthread.so.0 with debug symbols (and thus too large for the target), run strip -g on it, not a full strip.

Update:

info shared produced Cannot access memory at address 0x2ab05d18

This means that GDB can not access the shared library list (which would then explain the missing stack trace). The most usual cause: the binary that actually produced the core does not match the binary you gave to GDB. A less common cause: your core dump was truncated (perhaps due to ulimit -c being set too low).

Ability answered 24/11, 2011 at 6:29 Comment(2)
Hi, info shared produced Cannot access memory at address 0x2ab05d18. We have not touched the .so files.Defoliate
Not sure if this counts but another run shows the address to be in libc: info shared From To Syms Read Shared Object Library 0x2aaf7e70 0x2ab461f0 Yes /opt/nfsroot_bcm97335_stblinux-2.6.18-7.7_be/lib/libc.so.0Defoliate
H
1

If all else fails run the command using the debugger!

Just put "gdb" in form of your normal start command and enter "c"ontinue to get the process running. When the task segfaults it will return to the interactive gdb prompt rather than core dump. You should then be able to get more meaningful stack traces etc.

Another option is to use "truss" if it is available. This will tell you which system calls were being used at the time of the abend.

Hodson answered 24/11, 2011 at 4:52 Comment(7)
I'm guessing this is not possible. The program is running on an embedded system, and the asker has already tried using gdb with gdbserver.Sinusitis
Umm, as much as I enjoy pushing 'c' doing so would result in about 8640 times I would need to push 'c' before it crashes :) The only thing I could find about truss was for Solaris and Linux would use strace. strace would not really help here as far as I know. Thanks.Defoliate
@Defoliate -- c = continue until next breakpoint. No breakpoints no prompts until an exception needs handling.Hodson
Hi, not sure I follow that comment. The __nptl_create_event is not a real break point in the sense that I would do "b <func>". It is auto generated by gdb/pthread. So what could I use instead of 'c'?Defoliate
Is it breaking on SIGTRAP (which you can suppress!) or just on the entry point.Hodson
Yes it is a SIGTRAP, but if I use handle SIGTRAP nostop print pass the executable just exits straight away with SIGTRAP.Defoliate
@Defoliate -- Rats, back to the drawing board.Hodson

© 2022 - 2024 — McMap. All rights reserved.