Debugging a running daemon using gdb
Asked Answered
C

3

8

I am developing a high traffic network C server application that runs as a daemon. Under some circumstances, the app crashes (always without core). How I can debug the running daemon with gdb to find the place that generates the SIGSEGV?

Explanatory notes:

  1. I know how to attach using gdb to a running process using attach command

  2. After attaching to the process, it stops. If I run then "continue", gdb remains blocked if the program does not crash. If I press CTRL-C, the process is exiting and I am unable to simply detach gdb.

So the question is: is there a way to continue the process without the gdb being stuck but being able to detach if the process does not crash?

Cupric answered 23/4, 2013 at 12:10 Comment(2)
Have you tried change the coredump settings with e.g. the ulimit command? And/or running a debug version? Or possibly adding more logging to narrow down the possible locations for the crash?Lowgrade
I have tried all the possibilities. The process runs as a upstart service on a Ubuntu server and is setuid-ed to a certain user on service start. The limits.conf contains unlimited values for both nofile and core for that user. I have set fs.suid_dumpable and kernel.core_uses_pid in /etc/sysctl.conf I added more logging but is a high traffic server and it generates way too much output.Cupric
K
8

Try async mode and "continue &":

Save below to non-stop.gdb

set target-async on
set pagination off
set non-stop on

Then run:

$ gdb -x non-top.gdb
(gdb) !pgrep YOUR-DAEMON
1234
(gdb) attach 1234
(gdb) continue -a &
(gdb)
Korn answered 23/4, 2013 at 14:20 Comment(1)
Thank you. I will try this and will post a feedback.Cupric
M
3

This page attach/detach says that the detach command would work inside gdb.

If you want to catch a segmentation fault in an application, you will have to run the application from the debugger. Then when the signal is caught you can use where or bt to see a stack trace of the application. Of course you can not continue the application after it faulted, how should it recover? If you expect to trigger the fault soon, you can attach to the running process and again await the fault in the debugger.

If you want a stack trace after the fault occurred, then you really need a core file as there will be no process to attach to. Now if your daemon is started as part of the system it may be hard to get the configuration to dump core, plus you may not want other applications to leave core dumps all over the place. So then I'd advice to stop the system daemon and start it again in your user space, then you can allow it to dump core. If it is really essential that it starts up as part of the system, then see if the start-up of the daemon is confined to a single sub-shell and use ulimit -c in that sub-shell to set an appropriate maximum size for the core dump.

Micahmicawber answered 23/4, 2013 at 12:18 Comment(4)
I know, but after running "continue" command, the only way to exit the gdb is to press CTRL-C, which stops the running process.Cupric
Use detach instead of continue and then use quit. Works for me.Micahmicawber
I understand, but I want to be able to get a backtrace if the process crashes.Cupric
I added some more in the answer, I hope it helps.Micahmicawber
E
2

Another method to debug your application is to use the core file for debugging with GDB.

To generate a core file when segmentation occurs you can follow the below steps:

1) Copy the below parameters to your script which runs the daemon.

ulimit -c unlimited
mkdir -p <path_to_core_file>, eg : /etc/user/ankit/corefiles
chmod 777 /etc/user/ankit/corefiles
echo "/etc/user/ankit/corefiles/%e.%s.core" > /proc/sys/kernel/core_pattern

2) Run your application using the script and wait for the core dump file to be created. Once you get the core dump you can debug with gdb following the below steps as mentioned.

3) Getting backtrace using GDB

gdb -c <core_file>, where core_file is the file generated after segmentation fault

4) Backtrace

Next, we want to know what the stack was when the program crashed. Running bt at the gdb prompt will give you a backtrace. If gdb hadn’t loaded symbols for the binary, so it will throw an error with the question mark similarly to this "??????". To fix this you will have to load symbols.

Here’s how to load debugging symbols.

symbol-file /path/to/binary
sharedlibrary

5) Get backTrace for all the threads

thread apply all bt full

NOTE: Make sure the binary is compiled with debugging symbols.

Ectropion answered 23/7, 2019 at 8:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.