Restarting an Erlang node after a segmentation fault
Asked Answered
W

3

5

I'm currently running an Erlang application that is running C code through Nifs. However, if a segmentation fault occurs within the C code, the entire node goes down, as well as the Erlang virtual machine that the Erlang application was running on.

What is the best way to monitor the Erlang application and restart it if the virtual machine dies?

Wasteland answered 11/11, 2013 at 14:46 Comment(0)
S
3

You want to have a look at Heart.

In addition if you have NIF calls that are considered dangerous it is recommended to isolate them together with Erlang code close to them on a separate node. There are several ways of monitoring and restarting a node (e.g. Slave).

Generally however I would advise against the usage of problematic NIFs, depending on for what you are using them there are more stable alternatives.

Reason for NIF -> replacement

Sequential speed -> better optimized Erlang code. Often the high sequential speed of NIFs come at the price of them messing with Erlangs schedulers which often results in actual worse performance.

Interfacing with external libs/apps -> Erlangs ports are much better at failure isolation

Sugden answered 11/11, 2013 at 15:26 Comment(0)
P
3

I've used something called supervisord. Some advantages over heart:

  1. It's not erlang specific, so if you have other stuff on the same box, you can use it to restart things
  2. Heart can have some weird behavior preventing crash dumps.
  3. If you actually want to stop the erlang process for some reason, supervisord makes this easier.
  4. If the segfault occurs at start up, heart will continue to restart erlang infinitely. Supervisord will stop trying to restart after a certain number of attempts.
Physoclistous answered 11/11, 2013 at 16:45 Comment(1)
And to add to this, heart actually failed to start my node correctly, using the same command line used to start it initially. Don't know why and I don't have time to track it down.Tenacious
A
1

If you want to do it Erlang way, you can go with any of solutions, mentioned above (heart, supervisord). If you want to do it Unix way, first you should make your Erlang app behave as Unix daemon.

Use erld for that. Next you can do a familiar thing: monitoring/restarting familiar Unix daemon.

Ammo answered 14/11, 2013 at 12:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.