Rserve server: how to terminate a blocking instance (eval taking forever)?
Asked Answered
N

2

6

I need to perform R evals in a multi-threaded way, which is something Rserve provides quite well. But, if the eval of one instance takes too long, I need to be able to shutdown the instance which is computing the blocking eval. As far as I tested, the given instance will refuse to shutdown until the eval is done (apparently, it needs to fetch the result, before listening again). So here is my question:

Is there a way get a java handle on the blocking instance (something like a Process object), such that I can brute force kill/terminate the eval (something like process.destroy())? In other words, when I ask for an eval (create a connection, throw a command), how do I establish a relationship between the eval being processed, and the instance of Rsere related to it, via java?

Or did I miss something about Rserve which already allows to deal with this kind of needs?

Note: I already tried to run everything (all evals) via serverEval() instead of the regular eval, which runs the computations on the main instance, but this is, of course, not satisfying as it uses only one process (the main one). That one I can kill, but my main goal was to be able to shutdown individually a blocking eval, running on an individual instance. And, naturally, keep advantage of my 8 CPU cores, that is to say, preserve the parallelism. There is no point to use Rserve otherwise (JRI engine would be more than sufficient in this case).

Note: I would like to avoid this kind of things (thread), dealing with several instances of the main server itself, on different ports. That is not an option.

I already tried getting information on Rserve's mailing list, but haven't been answered. I hope I made myself clear enough to get an answer or helpful comment here. If not, please ask for details. Thanks so much by advance.

Edit: I also tested RCaller, which deals with as many instances of R one need, but, as it is writing results into XML files for later parsing from java side (not really using a communication protocol as Rserve would), it is far too slow for what I have to perform...

Normally answered 21/11, 2014 at 17:3 Comment(0)
N
6

OK, this can be done this way (caught it from some nice person who finally answered me on Rserve devel mailing list):

In the thread running the eval supposed to be blocking or too long, and assuming Rserve is started:

private RConnection rEngine = null;
private int rServePid = -1;

//...

// Keep an opened instance and store the related pid
RConnection rconn = new RConnection();
this.rServePid = rconn.eval("Sys.getpid()").asInteger();
this.rEngine = rconn;
LOG.info("Rserve: started instance with pid '" + this.rServePid + "'.");
//...
this.rEngine.eval("some consuming code...");

Which allows to keep track of the pid of the instance related to the said eval (R privides Sys.getpid()).

Then to stop / abort / cancel and since a simple this.rEngine.close() will not stop the task being processed on server side, but only close the connection, we need to kill the targeted Rserve instance. This can be done by calling tools::pskill() (or any other system call like possibly kill -9 my_pid (UNIX*), TASKKILL /PID my_pid /F (Windows), ..., depending on the platform), obviously from another thread than the one above (which is waiting for the "eval part" to return):

// Terminate.
RConnection c2 = new RConnection();
// SIGTERM might not be understood everywhere: so using SIGKILL signal, as well.
c2.eval("tools::pskill("+ this.rServePid + ")");
c2.eval("tools::pskill("+ this.rServePid + ", tools::SIGKILL)");
c2.close();
LOG.info("Rserve: terminated instance with pid '" + this.rServePid + "'.");

That one has the benefit to be plateform independent.

Hope this can help.

Normally answered 26/11, 2014 at 8:30 Comment(7)
This works. But if i try something like rEngine = new RConnection(); //get the pid of rEngine and do the time consuming eval using rEngine.. //Terminate RConnection c2 = new RConnection(); // SIGTERM might not be understood everywhere: so using SIGKILL signal, as well. c2.eval("tools::pskill("+ this.rServePid + ")"); c2.eval("tools::pskill("+ this.rServePid + ", tools::SIGKILL)"); c2.close(); , the process is killed, but cpu usage of Rserve goes up and Rserver doesnt allow any new connections. Do you have any idea why?Incisive
Sounds like the latest eval doesn't return and continues occupying a whole core. In some cases you'll have to get the entire processes tree (need for killing the parent process to really turn the thing off). Of course this approach is platform dependent. I have Windows/OSX and NUX batch scripts for this instead (system called). Let me know if you need more specific material. Cheers.Normally
i dont want to kill the entire process tree. Somebody else will be working on the same Rserve and killing the whole process will kill their instance too. The problem is not with the eval though. Eval returns and the code execution moves on to the next line. But the cpu usage peaks on killing an instance.Incisive
OK, I see. So it keeps the computer busy for a while. Is everything frozen or do the attempts for new connections succeed at some point? Frankly, I don't see a reason for that unless you are addressing your 'killer connection' to an already busy Rserve server. Note: Rserve is not able to spawn child connections under windows, like it would under *NUX. So if you are working under Windows, you might have to deal with several instances of Rserve. See, for example how things are handled in this Rserve wrapper (which I would use if I where you: Rsession.Normally
Look for how spawning instances has to be emulated under windows. What is your OS, BTW?Normally
New connections are not accepted at any point after the 'kill' command, until the rserve process is restarted. Rserve is running on a linux machine and connections are made from a windows machine. The java wrapper is really neat, but i have my own wrapper of some sort. I just want to know why does Rserve freezes when the pskill command is sent from another Rserve instance. This doesnt happen if I use the same command from a local R session on the server in which Rserve is running.Incisive
OK, I am sorry I cannot help. My experience of Rserve is limited to local use (my goal by using it is only to be able to run several R task in parallel, which is impossible to achieve with interfaces like JRI, ...). It seems that you identified that the issue was a network protocol matter. I suggest you ask on Rserve devel mailing list (those people gave me answers when I couldn't get some elsewhere). Thanks to share here, if you finally figure out what was the issue and how to fix it. Cheers.Normally
M
1

How about

rcon.eval("system(\"echo $$\", intern = TRUE)");

It will return pid of running Rserve (not the main) and then you can kill it by using this pid.

Madi answered 19/4, 2016 at 2:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.