Trying to close all child processes when I interrupt my bash script
Asked Answered
V

4

9

I have written a bash script to carry out some tests on my system. The tests run in the background and in parallel. The tests can take a long time and sometimes I may wish to abort the tests part way through.

If I Control+C then it aborts the parent script, but leaves the various children running. I wish to make it so that I can hit Control+C or otherwise to quit and then kill all child processes running in the background. I have a bit of code that does the job if I'm running running the background jobs directly from the terminal, but it doesn't work in my script.

I have a minimal working example.

I have tried using trap in combination with pgrep -P $$.

#!/bin/bash

trap 'kill -n 2 $(pgrep -P $$)' 2
sleep 10 &
wait

I was hoping that on hitting control+c (SIGINT) would kill everything that the script started but it actually says:

./breakTest.sh: line 1: kill: (3220) - No such process

This number changes, but doesn't seem to apply to any running processes, so I don't know where it is coming from.

I guess if the contents of the trap command get evaluated where the trap command occurs then it might explain the outcome. The 3220 pid might be for pgrep itself.

I'd appreciate some insight here

Thanks

Valval answered 30/12, 2018 at 19:9 Comment(0)
V
8

I have found a solution using pkill. This example also deals with many child processes.

#!/bin/bash
trap 'pkill -P $$' SIGINT SIGTERM
for i in {1..10}; do
   sleep 10 &
done
wait

This appears to kill all the child processes elegantly. Though I don't properly understand what the issue was with my original code, apart from sending the correct signal.

Valval answered 30/12, 2018 at 23:40 Comment(4)
Also, I found that I needed both SIGTERM and SIGINT in the trap to kill the jobs most elegantly. SIGINT on its own, would bring up warning messages about the jobs that were being killed. ThanksValval
elegant solution +1Dogwood
It's better to explicitly exit after killing child processes. Otherwise, if you have a loop that spawns a new batch of processes after the original ones exit, you'll still get new processes (and continue execution of the loop) after trap-killing the old ones. I.e. trap 'pkill -P $$; exit' SIGINT SIGTERMHort
'-9' is required to make sure child process is really killed: trap 'pkill -9 -P $$' SIGINT SIGTERMFernald
A
3

in bash whenever you you use & after a command it places that command as a background job ( this background jobs are called job_spec ) which is incremented by one until you exit that terminal session. You can use the jobs command to get the list of the background jobs running. To work with this jobs you have to use the % with the job id. The jobs command also accept other options such as jobs -p to see the proces sids of all jobs , jobs -p %JOB_SPEC to see the process of id of that particular job.

#!/usr/bin/env bash

trap 'kill -9 %1' 2

sleep 10 &

wait

or

#!/usr/bin/env bash

trap 'kill -9 $(jobs -p %1)' 2

sleep 10 &

wait

I implemented something like this few years back, you can take a look at it async bash

Arbiter answered 30/12, 2018 at 20:4 Comment(2)
Thanks 0.sh I found that one problem with what I was doing was that sending the interrupt signal to the background jobs would not kill them in most circumstances, even though it seems to work in the terminal. using signal 15 - SIGTERM is better. Also favorable to 9 - SIGKILL as it gives the jobs some time to shut itself down. Could you explain the distinction between the processes that jobs shows up and the ones that ps shows up? Is this fundamentally what I was doing wrong? Also the example I gave was minimal, I'd need to close down many jobs in the real script.Valval
As an aside - never use kill -9 unless you know you must. It's a widespread bad habit. Sometimes it becomes necessary, but many apps have well-behaved code that shuts down sockets and closes files and cleans up temp trash before exiting, and this totally prevents all that. Save -9's for when you really need them.Unicorn
S
0

You can try something like the following:

pkill -TERM -P <your_parent_id_here>
Siena answered 30/12, 2018 at 19:15 Comment(0)
F
0

To kill a group of related processes running as a group, you need to obtain the PGID somehow, and kill that.

On MacOS, I was able to reliably use this.

#!/bin/sh
bash -c 'echo one $$
python -c "import time; time.sleep(3)"
echo two $$
python -c "import time; time.sleep(3)"
echo three $$
python -c "import time; time.sleep(3)"' &
pid=$!
ps -j $pid
sleep $((RANDOM %9))
kill -- -$$

Properly speaking, you should obtain the PGID from ps -j instead of rely on the calling shell to be the process leader, but as a quick hack, this seems to work.

On Linux, you can use the /proc filesystem to obtain the PGID. See https://unix.stackexchange.com/questions/132224/is-it-possible-to-get-process-group-id-from-proc

Featureless answered 29/2 at 16:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.