What is the significance of a Mnesia Master Node in a cluster
Asked Answered
L

1

10

I am running two erlang nodes with a replicated mnesia database. Whenever I tried to start one of them while mnesia IS NOT Running on the other one, mnesia:wait_for_tables(?TABS,?TIMEOUT), would hang on the node that its called from. I need to have a structure where (if both nodes are not running), I can start working with one while the other is down and later decide to bring the other one up yet continue to work well. I need to be sure that the first node that was running has updated the later when it gets up. Does this necessarily require me to have one as the master?

%%% Edited...........................................................................

Oh, I've got it. The database I was using had a couple of fragmented tables. Some of the fragments had been distributed across the network for load balancing. So, Mnesia on one host would try to load them across the network and would fail since mnesia on the other one is down!

I guess this has got nothing to do with a mnesia master node. But I still would love to understand the significance of the same because I've not used it before, yet, I always play with distributed schemas.

Thanks again...

Lafountain answered 26/8, 2010 at 8:36 Comment(0)
A
6

Mnesia master nodes are used to resolve split-brain situations in a fairly brutal fashion. If mnesia discovers a split-brain situation, it will issue an event, "running partitioned network". One way to respond to this would be to set master nodes to the "island" that you want to keep, and then restart the other nodes. When they come back up, they will unconditionally load tables from the master nodes.

There is another mechanism in mnesia, called force_load. One should be very careful with it, but in the case where you have two nodes, A and B, terminate B (A logs B as down), then terminate A, then restart B, B will have no info about when A went down, so will refuse to load tables that have a copy on A. If you know that A is not coming back soon, you could choose to call mnesia:force_load_tables(Ts) on B, which will cause it to run with its own copies. Once A comes back up, it will detect that B is up, and will load tables from it. As you can see, there are several other scenarios where you can end up with an inconsistent database. Mnesia will not fix that, but tries to provide tools to resolve the situation if it arises. In the scenario above, unfortunately, mnesia will give you no hints, but it is possible to create an application that detects the problem.

Ademption answered 10/10, 2010 at 16:22 Comment(3)
uwiger, thank you. Do you think (in the future), mnesia will have a way of merging two replicas basing on a recent update mechanism or some kind of time signatures especially when the "running partitioned network" fatal error is detected?Lafountain
This is possible to do today, although not terribly well documented or tested in all parts. github.com/esl/unsplit is a library for automatic merging of mnesia tables after netsplits. Recent mnesia versions have been carefully enhanced to support this, and R14B03 also adds a form of quorum checking ('majority') to reduce the risk of hard-to-resolve inconsistencies.Ademption
I'd love to use unsplit with ejabberd.... but I've no idea where to start! Ulf / anyone - is there any docs about that I have not found?Carraway

© 2022 - 2024 — McMap. All rights reserved.