Using screen
together with gdb
to debug MPI applications works nicely, especially if xterm
is unavailable or you're dealing with more than a few processors. There were many pitfalls along the way with accompanying stackoverflow searches, so I'll reproduce my solution in full.
First, add code after MPI_Init to print out the PID and halt the program to wait for you to attach. The standard solution seems to be an infinite loop; I eventually settled on raise(SIGSTOP);
, which requires an extra call of continue
to escape within gdb.
}
int i, id, nid;
MPI_Comm_rank(MPI_COMM_WORLD,&id);
MPI_Comm_size(MPI_COMM_WORLD,&nid);
for (i=0; i<nid; i++) {
MPI_Barrier(MPI_COMM_WORLD);
if (i==id) {
fprintf(stderr,"PID %d rank %d\n",getpid(),id);
}
MPI_Barrier(MPI_COMM_WORLD);
}
raise(SIGSTOP);
}
After compiling, run the executable in the background, and catch the stderr. You can then grep
the stderr file for some keyword (here literal PID) to get the PID and rank of each process.
MDRUN_EXE=../../Your/Path/To/bin/executable
MDRUN_ARG="-a arg1 -f file1 -e etc"
mpiexec -n 1 $MDRUN_EXE $MDRUN_ARG >> output 2>> error &
sleep 2
PIDFILE=pid.dat
grep PID error > $PIDFILE
PIDs=(`awk '{print $2}' $PIDFILE`)
RANKs=(`awk '{print $4}' $PIDFILE`)
A gdb session can be attached to each process with gdb $MDRUN_EXE $PID
. Doing so within a screen session allows easy access to any gdb session. -d -m
starts the screen in detached mode, -S "P$RANK"
allows you to name the screen for easy access later, and the -l
option to bash starts it in interactive mode and keeps gdb from exiting immediately.
for i in `awk 'BEGIN {for (i=0;i<'${#PIDs[@]}';i++) {print i}}'`
do
PID=${PIDs[$i]}
RANK=${RANKs[$i]}
screen -d -m -S "P$RANK" bash -l -c "gdb $MDRUN_EXE $PID"
done
Once gdb has started in the screens, you may script input to the screens (so that you don't have to enter every screen and type the same thing) using screen's -X stuff
command. A newline is required at the end of the command. Here the screens are accessed by -S "P$i"
using the names previously given. The -p 0
option is critical, otherwise the command intermittently fails (based on whether or not you have previously attached to the screen).
for i in `awk 'BEGIN {for (i=0;i<'${#PIDs[@]}';i++) {print i}}'`
do
screen -S "P$i" -p 0 -X stuff "set logging file debug.$i.log
"
screen -S "P$i" -p 0 -X stuff "set logging overwrite on
"
screen -S "P$i" -p 0 -X stuff "set logging on
"
screen -S "P$i" -p 0 -X stuff "source debug.init
"
done
At this point you can attach to any screen using screen -rS "P$i"
and detach using Ctrl+A+D
. Commands may be sent to all gdb sessions in analogy with the previous section of code.