Situation
I have a daemon I wrote in PHP (not the best language for this, but work with me), and it is made to receive jobs from a queue and process them whenever a job needs to be done. For each new job, I use pcntl_fork() to fork the job off into a child process. Within this child process, I then use proc_open() to execute long-running system commands for audio transcoding, which returns directly to the child when finished. When the job is completely done, the child exits and is cleaned up by the parent process.
To keep this daemon always running, I use upstart. Here is my upstart configuration file:
description "Audio Transcoding Daemon"
start on startup
stop on shutdown
# kill signal SIGCHLD
kill timeout 1200 # Don't force kill the process until it runs over 20 minutes
respawn
exec audio-daemon.php
Goal
Because I want to use this daemon in a distributed environment, I want to be able to shutdown the server at any time without disrupting any running jobs. To do this, I have already implemented signal handlers using pcntl_signal() for SIGTERM, SIGHUP, and SIGINT on the parent process, which waits for all children to exit normally before exiting itself. The children also have signal handlers, but they are made to ignore all kill signals.
Problem
The problem is, according to the docs...
The signal specified by the kill signal stanza is sent to the process group of the main process. (such that all processes belonging to the jobs main process are killed). By default this signal is SIGTERM.
This is concerning because, in my child process, I run system commands through proc_open(), which spawns new child processes as well. So, whenever I run sudo stop audio-daemon
, this sub-process (which happens to be sox) is killed immediately, and the job returns back with an error. Apparently, sox obeys SIGTERM and does what it's told...
Originally, I thought, "Fine. I'll just change kill signal
to send something that is inherently ignored, and I'll just pick it up in the main process only." But according to the manual, there are only two signals that are ignored by default: SIGCHLD and SIGURG (and possibly SIGWINCH). But I'm afraid of getting false flags, since these can also be triggered other ways.
There are ways to create a custom signal using what the manual calls "Real-time Signals" but it also states...
The default action for an unhandled real-time signal is to terminate the receiving process.
So that doesn't help...
Can you think of any way that I can get upstart to keep all of my sub-processes open until they complete? I really don't want to go digging through sox's source code to modify its signal handlers, and while I could set SIGCHLD, SIGURG, or SIGWINCH as my upstart kill signal and pray nothing else sends them my way, I can't help but think there's a better way to do this... Any ideas?
Thanks for all your help! :)
system
call. As happend to you, this second script is killed when I stop the service. I tryed to use your solution (), but I got my service stucked into a stop/killed status after I stoped it (or into the start/killed status if I try to start it). – Arundel