Load balancing R requests coming to RServe
Asked Answered
F

3

9

I have 6 Linux box running RServe and serving same set of R Scripts.

192.168.0.1 : 6311
192.168.0.2 : 6311
...
...
192.168.0.6 : 6311

I connect from java to these Rserve using REngine (Rserve Java Client).

RConnection rServeConnection = new RConnection(R_SERVE_SERVER_ADDRESS, R_SERVE_SERVER_PORT);

Now how do I load balance this ? Preferably in Apache Mod Proxy?

I've tried with httpd websocket load balancing settings and no luck.

Update: Concluded httpd doesn't load balance TCP traffic(Rserve uses TCP, while there are options in Rserve to enable websocket mode, my use case don't need that extra layer). Moved to HAProxy for load balancing with config as in the below link and able to load balance R script requests coming to Rserve with fault tolerance.

HAProxy Loadbalancing TCP traffic

Fishbein answered 20/7, 2016 at 10:46 Comment(4)
I don't think rserve is http?Oscillograph
yes.. Rserve is not http. It connects via TCP/IP socket. IP:6311 @jeroenFishbein
Looks like more people are looking for same solution. Here is a working solution. Thumps up if it helps. https://mcmap.net/q/797292/-haproxy-loadbalancing-tcp-trafficFishbein
Great to hear that! I think the lesson here was to find a proper TCP and not just HTTP load balancing.Tiphane
F
0

Looks like more people are looking for a solution to load balancing R scripts. Here is a working solution to loadbalance R via Rserve and HAproxy TCP load balancer.

Thumps up if it helps.

https://mcmap.net/q/797292/-haproxy-loadbalancing-tcp-traffic

Fishbein answered 20/8, 2016 at 8:23 Comment(0)
T
1

I'm unsure if this is achieveable with Apache mod_proxy. I think it will only work with HTTP protocol. Maybe you can try a proof of concept setup with nginx. It supports load balancing of ordinary TCP and UDP connections. It also allows you todefine load balancing methods (e.g. round-robin, etc.).

The configuration would be:

stream {
    upstream myapp1 {
        server 192.168.0.1:6311;
        server 192.168.0.2:6311;
        ...
        server 192.168.0.6:6311;
    }

    server {
        listen 80;
        proxy_connect_timeout 1s;
        proxy_timeout 3s;
        proxy_pass backend;
    }
}

You can find more information in the nginx documentation: https://www.nginx.com/resources/admin-guide/tcp-load-balancing/ and here: https://nginx.org/en/docs/stream/ngx_stream_core_module.html

Tiphane answered 15/8, 2016 at 21:46 Comment(3)
TCP load balancing is added in Nginx since version 1.9, while I have option to use 1.6.3 only(enterprise repo limitation). So looking at HAproxy(1.5.14) as an alternate TCP reverse proxy. Yet to find a basic config which can load balance the TCP traffic. Not finding good documentation on the same. Any input on this will be really helpful.Fishbein
Any idea whats wrong with TCP load balancing wit HAProxy?Fishbein
Though I was not able to try this out due to Nginx version limitation which was not supporting TCP load balancing which is needed for Rserve, done similar thing with a different proxy server as mentioned in the link in my answer. So passing on the bounty to you @haddr. Thums up and thank you for the help.Fishbein
S
0

If you haven't already done so and since you are already working in Java, start off by connecting to your RServe servers from Java and run a simple "hello world" script on them, as given in the CRAN examples

Once the RServe instances are working fine, then you need to either load balance from Java or create one Java program per server and let Apache load balance between them. In either case your Java programs will need to serve http because you still need a link between html and RServe.

Suburbicarian answered 13/8, 2016 at 23:20 Comment(4)
Connecting from Java to Reserve even for complex request/response type if over even with a remote connection. Only thing is need to know why websocket load balancing is not working. Not preferring http connected Reserve as apache will keep sending request to java as long as java/tomcat is responding, even if Reserve behind it is not. Moreover it adds an extra hop to reach R. Why not websockt load balancing directly ? Can you please share your ideas on that ?Fishbein
I mean, why not to load balance a TCP/IP socket directly without a need for http load balancing which adds additional hop and risk of failure point.Fishbein
If you want to connect to the Rserve instances directly, you are going to have to dig through the source code for Rserve--or one of the available adapters--and figure out the on-the-wire protocol. What actually goes on the wire when you do a: RConnection c = new RConnection(); c.assign("x", dataX); c.assign("y", dataY); RList l = c.eval("lowess(x,y)").asList(); If you use one of the existing adapters (Java, C/C++, PHP) you don't have to worry about the on-the-wire format.Suburbicarian
There are TCP reverse proxy available(like HA proxy) which I'm trying out. With that, we wont need the above hack. I'm yet to figure out how to configure haproxy for tcp traffic though.Fishbein
F
0

Looks like more people are looking for a solution to load balancing R scripts. Here is a working solution to loadbalance R via Rserve and HAproxy TCP load balancer.

Thumps up if it helps.

https://mcmap.net/q/797292/-haproxy-loadbalancing-tcp-traffic

Fishbein answered 20/8, 2016 at 8:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.