Automatically restarting Erlang applications

Asked 16/6, 2010 at 15:18 Answered 17/6, 2010 at 13:33

I recently ran into a bug where an entire Erlang application died, yielding a log message that looked like this:

=INFO REPORT==== 11-Jun-2010::11:07:25 ===
     application: myapp
     exited: shutdown
     type: temporary

I have no idea what triggered this shutdown, but the real problem I have is that it didn't restart itself. Instead, the now-empty Erlang VM just sat there doing nothing.

Now, from the research I've done, it looks like there are other "start types" you can give an application: 'transient' and 'permanent'.

If I start a Supervisor within an application, I can tell it to make a particular process transient or permanent, and it will automatically restart it for me. However, according to the documentation, if I make an application transient or permanent, it doesn't restart it when it dies, but rather it kills all the other applications as well.

What I really want to do is somehow tell the Erlang VM that a particular application should always be running, and if it goes down, restart it. Is this possible to do?

(I'm not talking about implementing a supervisor on top of my application, because then it's a catch 22: what if my supervisor process crashes? I'm looking for some sort of API or setting that I can use to have Erlang monitor and restart my application for me.)

Thanks!

Sexagenary answered 16/6, 2010 at 15:18 Comment(0)

You should be able to fix this in the top-level supervisor: set the restart strategy to allow one million restarts every second, and the application should never crash. Something like:

init(_Args) ->
    {ok, {{one_for_one, 1000000, 1},
          [{ch3, {ch3, start_link, []},
            permanent, brutal_kill, worker, [ch3]}]}}.

(Example adapted from the OTP Design Principles User Guide.)

Remmer answered 16/6, 2010 at 16:51 Comment(5)

Great, thanks very much for your answer. I see now that the reason it died was indeed because the max restart limit was hit. I don't necessarily want to just disable that though, since if it actually gets into a restart loop then we may need to restart the entire app. Is there a way to have it restart the app if the AllowedRestarts/MaxSeconds limit is hit, instead of shutting down the app? – Sexagenary 16/6, 2010 at 18:6

In the case you describe you would add a supervisor to your supervisor. The behavior which OTP uses is that when an exit signal is sent to the process which does the start call to the application (i.e. when the top level supervisor dies) it assumes that the application has failed to fix the error and it will shutdown the application and possible the node depending on the config. I guess the point is that your applications should not crash, and if they do the error is serious enough to only be resolved by a node restart. – Barham 18/6, 2010 at 22:10

@jisaacstone Fixed the link. Apparently www.erlang.org needs to be changed to erlang.org. – Remmer 4/2, 2016 at 19:5

I do not agree with this answer as it distorts the intention of the restart limit. This should be the maximum number of automatic retries to attempt a successful restart of the application, and if this cannot retry by itself, then there should be manual intervention to help the application to start again. Putting so high limit on the automatic retries will only waste time with many successive application restarts. A more complete solution is to allow the entire node to crash and let another tool (like inaka and GregRogers say) outside Erlang to restart automatically the node. (Continues below) – Rabjohn 15/3, 2018 at 16:40

Subsequently, if not even the outside tool cannot restart the Erlang node successfully, then human intervention is needed. – Rabjohn 15/3, 2018 at 16:44

You can use heart to restart the entire VM if it goes down, then use a permanent application type to make sure that the VM exits when your application exits.

Ultimately you need something above your application that you need to trust, whether it is a supervisor process, the erlang VM, or some shell script you wrote - it will always be a problem if that happens to fail also.

Bestead answered 16/6, 2010 at 18:57 Comment(1)

Okay, thanks. That sort of solution will work fine for me in this case. However, what if I wanted to run more than one application at once, and have them restart independently as needed? With all the fancy process supervision features Erlang includes, I find it amazing that I can't seem to do something as simple as restart an application when it goes down.... – Sexagenary 18/6, 2010 at 14:29

Use Monit, then setup your application to terminate by using a supervisor for the whole application with a reasonable restart frequency. If the application terminates, the VM terminates, and monit restarts everything.

I could never get Heart to be reliable enough, as it only restarts the VM once, and it doesn't deal well with a kill -9 of the erlang VM.

Zaidazailer answered 17/6, 2010 at 13:33 Comment(0)

Recommended topics

Hot tags