What's the best way to run a gen_server on all nodes in an Erlang cluster?

Asked 3/4, 2011 at 19:39 Answered 3/4, 2011 at 22:54

Solved erlang erlang-otp erlang-supervisor

I'm building a monitoring tool in Erlang. When run on a cluster, it should run a set of data collection functions on all nodes and record that data using RRD on a single "recorder" node.

The current version has a supervisor running on the master node (rolf_node_sup) which attempts to run a 2nd supervisor on each node in the cluster (rolf_service_sup). Each of the on-node supervisors should then start and monitor a bunch of processes which send messages back to a gen_server on the master node (rolf_recorder).

This only works locally. No supervisor is started on any remote node. I use the following code to attempt to load the on-node supervisor from the recorder node:

rpc:call(Node, supervisor, start_child, [{global, rolf_node_sup}, [Services]])

I've found a couple of people suggesting that supervisors are really only designed for local processes. E.g.

What is the most OTP way to implement my requirement to have supervised code running on all nodes in a cluster?

A distributed application is suggested as one alternative to a distributed supervisor tree. These don't fit my use case. They provide for failover between nodes, but keeping code running on a set of nodes.
The pool module is interesting. However, it provides for running a job on the node which is currently the least loaded, rather than on all nodes.
Alternatively, I could create a set of supervised "proxy" processes (one per node) on the master which use proc_lib:spawn_link to start a supervisor on each node. If something goes wrong on a node, the proxy process should die and then be restarted by it's supervisor, which in turn should restart the remote processes. The slave module could be very useful here.
Or maybe I'm overcomplicating. Is directly supervising nodes a bad idea, instead perhaps I should architect the application to gather data in a more loosely coupled way. Build a cluster by running the app on multiple nodes, tell one to be master, leave it at that!

Some requirements:

The architecture should be able to cope with nodes joining and leaving the pool without manual intervention.
I'd like to build a single-master solution, at least initially, for the sake of simplicity.
I would prefer to use existing OTP facilities over hand-rolled code in my implementation.

Outdated answered 3/4, 2011 at 19:39 Comment(0)

Interesting challenges, to which there are multiple solutions. The following are just my suggestions, which hopefully makes you able to better make the choice on how to write your program.

As I understand your program, you want to have one master node where you start your application. This will start the Erlang VM on the nodes in the cluster. The pool module uses the slave module to do this, which require key-based ssh communication in both directions. It also requires that you have proper dns working.

A drawback of slave is that if the master dies, so does the slaves. This is by design as it probably fit the original use case perfectly, however in your case it might be stupid (you may want to still collect data, even if the master is down, for example)

As for the OTP applications, every node may run the same application. In your code you can determine the nodes role in the cluster using configuration or discovery.

I would suggest starting the Erlang VM using some OS facility or daemontools or similar. Every VM would start the same application, where one would be started as the master and the rest as slaves. This has the drawback of marking it harder to "automatically" run the software on machines coming up in the cluster like you could do with slave, however it is also much more robust.

In every application you could have a suitable supervision tree based on the role of the node. Removing inter-node supervision and spawning makes the system much simpler.

I would also suggest having all the nodes push to the master. This way the master does not really need to care about what's going on in the slave, it might even ignore the fact that the node is down. This also allows new nodes to be added without any change to the master. The cookie could be used as authentication. Multiple masters or "recorders" would also be relatively easy.

The "slave" nodes however will need to watch out for the master going down and coming up and take appropriate action, like storing the monitoring data so it can send it later when the master is back up.

Mic answered 3/4, 2011 at 21:17 Comment(3)

Reading the Riak source, I'm leaning this way. – Outdated 3/4, 2011 at 22:18

Leaning the way of doing it the Riak way, or leaning the way of doing it the way I suggested? Also, while Riak Core is very interesting, I fail to see how it would be useful in your case as Riak Core basically helps you route requests to the right server and move stuff around when necessary. – Mic 4/4, 2011 at 19:57

Leaning towards the way you suggest. riak_core has some nice principles which I'd like to borrow, but I agree that the full hashed ring is not the right solution for this app. – Outdated 5/4, 2011 at 10:10

I would look into riak_core. It provides a layer of infrastructure for managing distributed applications on top of the raw capabilities of erlang and otp itself. Under riak_core, no node needs to be designated as master. No node is central in an otp sense, and any node can take over other failing nodes. This is the very essence of fault tolerance. Moreover, riak_core provides for elegant handling of nodes joining and leaving the cluster without needing to resort to the master/slave policy.

While this sort of "topological" decentralization is handy, distributed applications usually do need logically special nodes. For this reason, riak_core nodes can advertise that they are providing specific cluster services, e.g., as embodied by your use case, a results collector node.

Another interesting feature/architecture consequence is that riak_core provides a mechanism to maintain global state visible to cluster members through a "gossip" protocol.

Basically, riak_core includes a bunch of useful code to develop high performance, reliable, and flexible distributed systems. Your application sounds complex enough that having a robust foundation will pay dividends sooner than later.

otoh, there's almost no documentation yet. :(

Here's a guy who talks about an internal AOL app he wrote with riak_core:

http://www.progski.net/blog/2011/aol_meet_riak.html

Here's a note about a rebar template:

http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-March/003632.html

...and here's a post about a fork of that rebar template:

https://github.com/rzezeski/try-try-try/blob/7980784b2864df9208e7cd0cd30a8b7c0349f977/2011/riak-core-first-multinode/README.md

...talk on riak_core:

http://www.infoq.com/presentations/Riak-Core

...riak_core announcement:

http://blog.basho.com/2010/07/30/introducing-riak-core/

Grunt answered 3/4, 2011 at 22:54 Comment(1)

Thanks for expanding your answer. I was aware of riak_core but not sure if it was a good fit. Advertising services on special nodes is very interesting. I think this app is still slightly outside riak_core's target problem space. I'm going to stick with plain OTP for now. As riak_core gets better documented and if the app is useful, I may switch over. – Outdated 5/4, 2011 at 10:9

Recommended topics

Hot tags