Supervisor.restart_child/2 or Process.exit(pid, :kill)?

I have a supervision tree, for the sake of simplicity let’s say one supervisor (S) holding one worker (W) with :one_for_one strategy.

Under some circumstances, I need to reinitialize W and the easiest way to do that is to crash it. I have two different options: Process.exit/2 and/or Supervisor.restart_child/2:

Process.exit(W, :kill)

and

[{id, _, :worker, _}] = Supervisor.which_children(S)
Supervisor.terminate_child(S, w_id)
Supervisor.restart_child(S, w_id)

Since the latter exists, I suppose it might be of better use, but I cannot realize what would be an advantage of using it. The assumption would be for the strategy :rest_for_one and many children, the former will restart the whole tail of the workers’ list, while the latter would restart this particular worker only. I cannot find any reasonable documentation on that neither catch this difference in the codebase.

So, the question would be: when using the strategy :one_by_one does it make any sense to go through terminate → restart loop or Process.exit(pid, :kill) is sufficient enough?

Apart from the worker having a say about the exit signal when using Process.exit/2, which may not be exactly related to your scenario (although one might hook the whatever reinitialization necessary to the exit signal trap avoiding restarting the worker), here is something that might be relevant.

Approach A. When Process.exit(W, :kill) gets called, here is what happens:

The worker process gets terminated
The supervisor receives 'EXIT' signal from the terminated worker
The supervisor calls restart_child/3 which does that according to the specified restart strategy

Very lean and mean.

Approach B. With the other, manual, approach here is what happens:

terminate_child/2 ends up calling shutdown/2
shutdown/2 by default attempts to shut down the child gracefully, giving it a chance to free up any system resources
If the child does not exit after a timeout, it gets killed
Since the child gets unlinked before being shut down, the supervisor does not receive the 'EXIT' signal and does not automatically restart the child
The next call, restart_child(S, w_id) restarts the child reusing its specification and circumventing the restart strategy in place

For the Approach A to be applicable the children must not allocate external resources. With one_for_one strategy it is a decent shortcut, useful within its constraints. With other strategies it leads to potentially useless and/or expensive restarts of other children. This approach could be a reasonable optimization over a stable solution when its constraints are not an issue.

Approach B is a more general way to control individual children restarts. It does involve more complex logic, affording graceful behavior in exchange. It also allows for extra logic being placed between child termination and restart. The extra benefit is the precision of restarting just the target child even with rest_for_one or one_for_all strategies. In my opinion it is a better option for an evolving app for it is not limited by specific constraints, allowing for easier implementation change.

Recommended topics

Hot tags