Remove zombie processes using parallel package
Asked Answered
V

2

16

After I have played around for some time using R's parallel package on my Debian-based machine I still can't find a way to remove all zombie child-processes after a computation.

I'm searching for a general and OS independent solution.

Below a simple script illustrating the problem for 2 cores:

library(parallel)
testfun <- function(){TRUE}

cltype <- ifelse(.Platform$OS.type != "windows", "FORK", "PSOCK")
cl <- makeCluster(2, type = cltype)
p <- clusterCall(cl, testfun)
stopCluster(cl)

Unfortunately, this script leaves two zombie processes in the process table which only get killed if R is shut down.

Vicenary answered 28/2, 2012 at 17:39 Comment(1)
They are not zombies, they are just unemployed childrenEccles
T
7

This only seems to be an issue with "FORK" clusters. If you make a "PSOCK" cluster instead, the processes will die when you call stopCluster(cl).

Is there anything preventing you from using a "PSOCK" cluster on your Debian-based machine?

Truly answered 1/3, 2012 at 15:29 Comment(3)
Hi Josh,Sorry for my late reply - you are right, this only seems to be a problem for Fork Clusters. PSOCK clusters also work on my Debian Machine - just thought Forking would be faster. Thanks a lot!Vicenary
This seems to be a silly oversight with FORK clusters. I've posted a bug report at bugs.r-project.org/bugzilla3/show_bug.cgi?id=15471 . Zombie processes are mostly harmless because they consume no resources. They are just sitting in the process table so that the parent process can examine their exit status. Examining their exit status with library(fork) wait() will clean up the zombies one at a time (and print the exit status of each).Allowedly
The fork package is no longer available.Yours
D
4

Probably the answer of your problem is in the help file of makeCluster() command.

At the bottom of the file, it is written : It is good practice to shut down the workers by calling stopCluster: however the workers will terminate themselves once the socket on which they are listening for commands becomes unavailable, which it should if the master R session is completed (or its process dies).

The solution is (it is working for me) to define a port for your cluster while you are creating it.

cl <- makeCluster(2, type = cltype,port=yourPortNumber)

another (may be not usefull) solution is setting a timeout for your sockets. timeout variable is in seconds.

cl <- makeCluster(2, type = cltype,port=yourPortNumber,timeout=50)

In any case, the aim should be to make the socket connection unavailable.either closing the ports or closing the main R process would do this.

Edit: What I meant was to close the ports which the process is listening. It should be OS independent. you can try to use -> showConnections(all = TRUE); . This will give all the connections. Then you can try closeAllConnections();

Sorry if this doesn't work also.

Dabber answered 29/2, 2012 at 14:37 Comment(2)
Specifying the port number doesn't work for me on Ubuntu. What version of Debian are you running?Truly
Regarding your edit: stopCluster(cl) already closes the ports. That's what causes the processes to become zombie processes.Truly

© 2022 - 2024 — McMap. All rights reserved.