Supervisor.restart_child/2 or Process.exit(pid, :kill)?
Asked Answered
L

1

7

I have a supervision tree, for the sake of simplicity let’s say one supervisor (S) holding one worker (W) with :one_for_one strategy.

Under some circumstances, I need to reinitialize W and the easiest way to do that is to crash it. I have two different options: Process.exit/2 and/or Supervisor.restart_child/2:

Process.exit(W, :kill)

and

[{id, _, :worker, _}] = Supervisor.which_children(S)
Supervisor.terminate_child(S, w_id)
Supervisor.restart_child(S, w_id)

Since the latter exists, I suppose it might be of better use, but I cannot realize what would be an advantage of using it. The assumption would be for the strategy :rest_for_one and many children, the former will restart the whole tail of the workers’ list, while the latter would restart this particular worker only. I cannot find any reasonable documentation on that neither catch this difference in the codebase.

So, the question would be: when using the strategy :one_by_one does it make any sense to go through terminaterestart loop or Process.exit(pid, :kill) is sufficient enough?

Ludie answered 2/8, 2018 at 10:48 Comment(0)
F
3

Apart from the worker having a say about the exit signal when using Process.exit/2, which may not be exactly related to your scenario (although one might hook the whatever reinitialization necessary to the exit signal trap avoiding restarting the worker), here is something that might be relevant.

Approach A. When Process.exit(W, :kill) gets called, here is what happens:

  1. The worker process gets terminated
  2. The supervisor receives 'EXIT' signal from the terminated worker
  3. The supervisor calls restart_child/3 which does that according to the specified restart strategy

Very lean and mean.

Approach B. With the other, manual, approach here is what happens:

  1. terminate_child/2 ends up calling shutdown/2
  2. shutdown/2 by default attempts to shut down the child gracefully, giving it a chance to free up any system resources
  3. If the child does not exit after a timeout, it gets killed
  4. Since the child gets unlinked before being shut down, the supervisor does not receive the 'EXIT' signal and does not automatically restart the child
  5. The next call, restart_child(S, w_id) restarts the child reusing its specification and circumventing the restart strategy in place

For the Approach A to be applicable the children must not allocate external resources. With one_for_one strategy it is a decent shortcut, useful within its constraints. With other strategies it leads to potentially useless and/or expensive restarts of other children. This approach could be a reasonable optimization over a stable solution when its constraints are not an issue.

Approach B is a more general way to control individual children restarts. It does involve more complex logic, affording graceful behavior in exchange. It also allows for extra logic being placed between child termination and restart. The extra benefit is the precision of restarting just the target child even with rest_for_one or one_for_all strategies. In my opinion it is a better option for an evolving app for it is not limited by specific constraints, allowing for easier implementation change.

Fortier answered 2/8, 2018 at 22:29 Comment(4)
OK, this is correct and it was stated in the question itself. Are you sure that for one_for_rest strategy it won’t restart subsequent children? If yes, where can I assure it myself?Ludie
Yes, because strategy-defined restart is invoked by restart_child/3, which calls restart at L854. It is only called from 'EXIT' signal handler and that signal does not arrive because of unlinking. Instead you call restart_child/2 which fetches the existing specification and calls do_start_child without ever looking at the restart strategy.Fortier
There is ErlangVM behind. It might take care.Ludie
With everything else being explicitly programmed in supervisor.erl I wouldn't expect that. Of course, one could just test.Fortier

© 2022 - 2024 — McMap. All rights reserved.