Terminating zombie child processes forked from socket server
Asked Answered
E

4

18

Disclaimer

I am well aware that PHP might not have been the best choice in this case for a socket server. Please refrain from suggesting different languages/platforms - believe me - I've heard it from all directions.

Working in a Unix environment and using PHP 5.2.17, my situation is as follows - I have constructed a socket server in PHP that communicates with flash clients. My first hurtle was that each incoming connection blocked the sequential connections until it had finished being processed. I solved this by utilizing PHP's pcntl_fork(). I was successfully able to spawn numerous child processes (saving their PID in the parent) that took care of broadcasting messages to the other clients and therefore "releasing" the parent process and allowing it to continue to process the next connection[s].

My main issue right now is dealing/handling with the collection of these dead/zombie child processes and terminating them. I have read (over and over) the relevant PHP manual pages for pcntl_fork() and realize that the parent process is in charge of cleaning up its children. The parent process receives a SIGNAL from its child when the child executes an exit(0). I am able to "catch" that signal using the pcntl_signal() function to setup a signal handler.

My signal_handler looks like this :

declare(ticks = 1); 
function sig_handler($signo){ 
  global $forks; // this is an array that holds all the child PID's
  foreach($forks AS $key=>$childPid){
    echo "has my child {$childPid} gone away?".PHP_EOL;
    if (posix_kill($childPid, 9)){
      echo "Child {$childPid} has tragically died!".PHP_EOL;
      unset($forks[$key]);
    }
  }
}

I am indeed seeing both echo's including the relevant and correct child PID that needs to be removed but it seems that

posix_kill($childPid, 9)

Which I understand to be synonymous with kill -9 $childPid is returning TRUE although it is in fact NOT removing the process...

Taken from the man pages of posix_kill :

Returns TRUE on success or FALSE on failure.


I am monitoring the child processes with the ps command. They appear like this on the system :

web5      5296  5234  0 14:51 ?        00:00:00 [php] <defunct>
web5      5321  5234  0 14:51 ?        00:00:00 [php] <defunct>
web5      5466  5234  0 14:52 ?        00:00:00 [php] <defunct>

As you can see all these processes are child processes of the parent which has the PID of 5234

Am I missing something in my understanding? I seem to have managed to get everything to work (and it does) but I am left with countless zombie processes on the system!

My plans for a zombie apocalypse are rock solid -
but what on earth can I do when even sudo kill -9 does not kill the zombie child processes?


Update 10 Days later

I've answered this question myself after some additional research, if you are still able to stand my ramblings proceed at will.

Ebarta answered 2/4, 2012 at 12:28 Comment(8)
@jon - removing the image I can understand (if it REALLY bugged you) but we are dealing with what is called zombie processes. Processes that have terminated but are still on the system waiting for their parent to clean them up. Don't remove text or edit a post if you are not 100% what it is about.Ebarta
I am no expert on php but it may have to do with the fact that your child processes themselves are PHP-based and will cease to exist when the PHP runtime ceases which is still in use by the parent... to test this idea create non-PHP-based children (even ls should do fine for such a test).Brooder
+1 for doing your own research, explaining your findings and only posting when you've run out of obvious questions.Muco
@str - thanks :P I've been banging my head on the keyboard for a couple of hours :P Time to post a question....Ebarta
Don't kill -9. First kill gently, wait a bit (maybe loop), and as a last resort use kill -KILL. As others have said: you cannot kill zombies. They are already dead and wait to be reaped, either by your process, or by init.Prickett
@Ebarta - Thanks, but I am quite familiar with zombie processes; I'm not sure you are. I normally wouldn't have removed it, but the image wasn't even loading for me - it was replaced with a placeholder image, usually used when sites are blocking hotlinked images.Amen
@jon - I was referring to the title. I don't mind about the image - possibly an aprilfools slowplay :PEbarta
@Lix, ah, well that wasn't even me... I only removed the image. Err wait - the history shows I did change the title. Wow, no idea why that happened, I certainly didn't intend to. My apologies - it is of course appropriate.Amen
E
22

I promise there is a solution at the end :P

Alright... so here we are, 10 days later and I believe that I have solved this issue. I didn't want to add onto an already longish post so I'll include in this answer some of the things that I tried.

Taking @sym's advice, and reading more into the documentation and the comments on the documentation, the pcntl_waitpid() description states :

If a child as requested by pid has already exited by the time of the call (a so-called
"zombie" process), the function returns immediately. Any system resources used by the child
are freed...

So I setup my pcntl_signal() handler like this -

function sig_handler($signo){ 
    global $childProcesses;
    $pid = pcntl_waitpid(-1, $status, WNOHANG);
    echo "Sound the alarm! ";
    if ($pid != 0){
        if (posix_kill($pid, 9)){
            echo "Child {$pid} has tragically died!".PHP_EOL;
            unset($childProcesses[$pid]);
        }
    }
}
// These define the signal handling
// pcntl_signal(SIGTERM, "sig_handler");
// pcntl_signal(SIGHUP,  "sig_handler");
// pcntl_signal(SIGINT, "sig_handler");
pcntl_signal(SIGCHLD, "sig_handler");

For completion, I'll include the actual code I'm using for forking a child process -

function broadcastData($socketArray, $data){
        global $db,$childProcesses;
        $pid = pcntl_fork();
        if($pid == -1) {
                // Something went wrong (handle errors here)
                // Log error, email the admin, pull emergency stop, etc...
                echo "Could not fork()!!";
        } elseif($pid == 0) {
                // This part is only executed in the child
                foreach($socketArray AS $socket) {
                        // There's more happening here but the essence is this
                        socket_write($socket,$msg,strlen($msg));

                        // TODO : Consider additional forking here for each client. 
                }
                // This is where the signal is fired
                exit(0);
        }

        // If the child process did not exit above, then this code would be
        // executed by both parent and child. In my case, the child will 
        // never reach these commands. 
        $childProcesses[] = $pid;
        // The child process is now occupying the same database 
        // connection as its parent (in my case mysql). We have to
        // reinitialize the parent's DB connection in order to continue using it. 
        $db = dbEngine::factory(_dbEngine); 
}

Yea... That's a ratio of 1:1 comments to code :P

So this was looking great and I saw the echo of :

Sound the alarm! Child 12345 has tragically died!

However when the socket server loop did it's next iteration, the socket_select() function failed throwing this error :

PHP Warning: socket_select(): unable to select [4]: Interrupted system call...

The server would now hang and not respond to any requests other than manual kill commands from a root terminal.


I'm not going to get into why this was happening or what I did after that to debug it... lets just say it was a frustrating week...

much coffee, sore eyes and 10 days later...

Drum roll please

TL&DR - The Solution :

Mentioned here in a comment from 2007 in the php sockets documentation and in this tutorial on stuporglue (search for "good parenting"), one can simply "ignore" signals comming in from the child processes (SIGCHLD) by passing SIG_IGN to the pcntl_signal() function -

pcntl_signal(SIGCHLD, SIG_IGN);

Quoting from that linked blog post :

If we are ignoring SIGCHLD, the child processes will be reaped automatically upon completion.

Believe it or not - I included that pcntl_signal() line, deleted all the other handlers and things dealing with the children and it worked! There were no more <defunct> processes left hanging around!

In my case, it really did not interest me to know exactly when a child process died, or who it was, I wasn't interested in them at all - just that they didn't hang around and crash my entire server :P

Ebarta answered 11/4, 2012 at 22:35 Comment(3)
:) yes. Thanks to you now it is solved and working like a clock.Gourmandise
You have just saved my day! Thank you very much! :)Paralytic
This was well written and exactly my use case. When all process are used up // such that max forks are reached you may get an pcntl_fork(): Error 35Embus
T
4

Regards your disclaimer - PHP is no better / worse than many other languages for writing a server in. There are some things which are not possible to do (lightweight processes, asynchronuos I/O) but these do not really apply to a forking server. If you're using OO code, then do ensure that you've got the circular reference checking garbage collector enabled.

Once a child process exits, it becomes a zombie until the parent process cleans it up. Your code seems to send a KILL signal to every child on receipt of any signal. It won't clean up the process entries. It will terminate processes which have not called exit. To get the child process reaped correctly you should call waitpid (see also this example on the pcntl_wait manual page).

Trustworthy answered 2/4, 2012 at 12:49 Comment(10)
I am sending a SIGKILL to the child process because that is the only thing I would want to do with a child process - whether or not it was successful is not too important to me ATM... The sig_handler() function is triggered when I execute an exit(0) command from within the child.Ebarta
I am not using OOP for in this case. My understanding of pcntl_waidpid() and what is stated on the man pages is that it suspends execution of the current process - this is undesirable for my parent (server) as I need him to continue to process incoming connections - this was the main point of using pcntl_fork() in the first place.Ebarta
re your disclaimer to my disclaimer - I was trying to prevent all the "PHP is not the best way to implement socket communication" comments. Seems like it worked :PEbarta
Regards SIGKILL - but you're sending this signal to all child processes - not just the ones which have exited (and it will have no effect on the ones which have exited). Read the rest of the pcntl_waitpid page - try replacing your foreach loop with pcntl_waitpid(-1, $returned_status, WNOHANG | WUNTRACED); This will reap child processes which have exited and leave the rest alone. Yes the parent process is blocked while this takes place - this is pretty much essential. There are lots of things which cause a process to be blocked.Trustworthy
Does this not then defeat the object of forking a new process? By making the parent wait till the child process is terminated?Ebarta
WNOHANG This flag specifies that waitpid should return immediately instead of waiting, if there is no child process ready to be noticed. | WUNTRACED This flag specifies that waitpid should report the status of any child processes that have been stopped as well as those that have terminated. - delorie.com/gnu/docs/glibc/libc_569.htmlMuco
@Lix: It does not. With the WNOHANG flag, the function will return immediately with the status of the child process. It does not wait for it to exit.Muco
@Str - thank you very much for your explanation. It'll have to test this in a real world scenario with "real" latency - for now you defiantly have set me in the right direction! Thanks again!Ebarta
Glad to help. Would appreciate if you could take a minute to update us if you find a solution. This was interesting question.Muco
@str - you bet! I'll be sure to update this post with my findings and possibly (hopefully) will post my own answer with the appropriate credits :)Ebarta
M
2

http://www.linuxsa.org.au/tips/zombies.html

Zombies are dead processes. You cannot kill the dead. All processes eventually die, and when they do they become zombies. They consume almost no resources, which is to be expected because they are dead! The reason for zombies is so the zombie's parent (process) can retrieve the zombie's exit status and resource usage statistics. The parent signals the operating system that it no longer needs the zombie by using one of the wait() system calls.

When a process dies, its child processes all become children of process number 1, which is the init process. Init is ``always'' waiting for children to die, so that they don't remain as zombies.

If you have zombie processes it means those zombies have not been waited for by their parent (look at PPID displayed by ps -l). You have three choices: Fix the parent process (make it wait); kill the parent; or live with it. Remember that living with it is not so hard because zombies take up little more than one extra line in the output of ps.

Muco answered 2/4, 2012 at 13:5 Comment(1)
Depending on the amount of effort you feel worth putting into this, you might want to consider using the Gearman library.Muco
P
1

I know only too well how hard you have to search for a solution to the problem of zombie processes. My concern with potentially having hundreds or thousands of them was (rightly or wrongly as I don't know if this would actualy be a problem) running out of inodes, as all hell can break loose when that happens.

If only the pcntl_fork() manual page linked to posix-setsid() many of us would have discovered the solution was so simple years ago.

Poverty answered 6/2, 2014 at 21:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.