How to prevent upstart from killing child processes to a daemon?
Asked Answered
T

2

6

Situation

I have a daemon I wrote in PHP (not the best language for this, but work with me), and it is made to receive jobs from a queue and process them whenever a job needs to be done. For each new job, I use pcntl_fork() to fork the job off into a child process. Within this child process, I then use proc_open() to execute long-running system commands for audio transcoding, which returns directly to the child when finished. When the job is completely done, the child exits and is cleaned up by the parent process.

To keep this daemon always running, I use upstart. Here is my upstart configuration file:

description "Audio Transcoding Daemon"

start on startup
stop on shutdown
# kill signal SIGCHLD
kill timeout 1200 # Don't force kill the process until it runs over 20 minutes
respawn

exec audio-daemon.php

Goal

Because I want to use this daemon in a distributed environment, I want to be able to shutdown the server at any time without disrupting any running jobs. To do this, I have already implemented signal handlers using pcntl_signal() for SIGTERM, SIGHUP, and SIGINT on the parent process, which waits for all children to exit normally before exiting itself. The children also have signal handlers, but they are made to ignore all kill signals.

Problem

The problem is, according to the docs...

The signal specified by the kill signal stanza is sent to the process group of the main process. (such that all processes belonging to the jobs main process are killed). By default this signal is SIGTERM.

This is concerning because, in my child process, I run system commands through proc_open(), which spawns new child processes as well. So, whenever I run sudo stop audio-daemon, this sub-process (which happens to be sox) is killed immediately, and the job returns back with an error. Apparently, sox obeys SIGTERM and does what it's told...

Originally, I thought, "Fine. I'll just change kill signal to send something that is inherently ignored, and I'll just pick it up in the main process only." But according to the manual, there are only two signals that are ignored by default: SIGCHLD and SIGURG (and possibly SIGWINCH). But I'm afraid of getting false flags, since these can also be triggered other ways.

There are ways to create a custom signal using what the manual calls "Real-time Signals" but it also states...

The default action for an unhandled real-time signal is to terminate the receiving process.

So that doesn't help...

Can you think of any way that I can get upstart to keep all of my sub-processes open until they complete? I really don't want to go digging through sox's source code to modify its signal handlers, and while I could set SIGCHLD, SIGURG, or SIGWINCH as my upstart kill signal and pray nothing else sends them my way, I can't help but think there's a better way to do this... Any ideas?

Thanks for all your help! :)

Takahashi answered 17/1, 2014 at 5:33 Comment(0)
T
5

Since I haven't received any other answers for how to do this a better way, this is what I ended up doing, and I hope it helps someone out there...

To stall shutdown/reboot of the system until the daemon is finished, I changed my start on and stop on in my upstart configuration. And to keep upstart from killing my children, I resorted to using SIGURG as my kill signal, which I then catch as a kill signal in my main daemon process only.

Here is my final upstart configuration:

description "Audio Transcoding Daemon"

start on runlevel [2345]
stop on starting rc RUNLEVEL=[016] # Block shutdown/reboot until the daemon ends

kill signal SIGURG # Kill the process group with SIGURG instead of SIGTERM so only the main process will pick it up (since SIGURG will be ignored by all children by default)

kill timeout 1200 # Don't force kill the process until it runs over 20 minutes

respawn

exec audio-daemon.php

Note that using stop on starting rc RUNLEVEL=[016] is necessary to stall shutdown/reboot. stop on runlevel [016] will not work.

Also note that if you use SIGURG in your application for any other reason, using it as a kill signal may cause problems. In my case, I wasn't, so this works fine as far as I can tell.

Ideally, it would be nice if the POSIX standard provided a user-defined signal like SIGUSR1 and SIGUSR2 that was ignored by default. But right now, it looks like it doesn't exist.

Feel free to chime in if you have a better answer, but for now, I hope this helps anyone else having this problem.

Takahashi answered 21/1, 2014 at 0:44 Comment(1)
I have a very similar situation here... I also have a PHP script executed into the daemon, but this script executes another PHP script using the system call. As happend to you, this second script is killed when I stop the service. I tryed to use your solution (), but I got my service stucked into a stop/killed status after I stoped it (or into the start/killed status if I try to start it).Arundel
I
1

Disclaimer: I don't know any PHP

I solved a similar problem with my ruby process by setting a new group id for a launched subprocess. It looks like php has a similar facility.

you can start a new group (detaching from your audio-daemon.php) by settings it's group id to its process id

something like

$chldPid=pcntl_fork()
... << error checks etc
 if ($chldPid){
    ...
    posix_setpgid($chldPid, $chldPid)
Isooctane answered 6/6, 2014 at 17:48 Comment(6)
Notice that pcntl_fork returns a pid for the parent process, not for the child. So you are setting a different process group id for the parent, not the child. pcntl_fork() == 0 => child. pcntl_fork() > 0 => parent. pcntl_fork() < 0 => could not fork. To get the child's pid call posix_getpid.Dingdong
@Dingdong am I misreading this? php.net/manual/en/function.pcntl-fork.php "On success, the PID of the child process is returned in the parent's thread of execution"Isooctane
That's what I'm saying: pcntl_fork returns the child's pid in the parent process, and it returns 0 in the child process. Your condition if ($chldPid) evaluates to true for the parent process and false for the child process. That means you are setting the group id in the parent process, not in the child process. :-)Dingdong
Okay I take that back: It doesn't actually matter where you call posix_setpgid($pid, $gid) since you are specifying what $pid should get the new $gid. According to https://mcmap.net/q/1778239/-why-is-there-timing-problem-while-to-fork-child-processes you should call posix_setpgid in both the parent and child process to avoid a race condition. But I have no idea how accurate that information is. In the child you can simply posix_setpgid(0, 0). That will set the process' group id to it's process id. Cheers!Dingdong
I doubt that a race condition exists if you stick to setting it always from one place (always in parent or always in child). Note: I know no PHPIsooctane
The race condition exists. That's not a question. The question is whether it's relevant (whether ignoring it can do harm to your particular application). PHP only uses the underlining c functions so the race condition exists in any POSIX system, probably even in Ruby. But I disgress. This has little to do with your answer. So I'll just leave my upvote and shut up, know. :-)Dingdong

© 2022 - 2024 — McMap. All rights reserved.