One replicated mnesia table has become out-of-sync
Asked Answered
A

2

6

I have an erlang application currently running on four nodes with a replicated mnesia db that stores minimal data regarding connected clients. The mnesia replication has been working seamlessly in the past (as far as I know anyway) but a client recently noticed that one of the nodes is missing some ids related to his application.

I'm not really sure how this happened. Our network may have had a hiccup at the time. Maybe? But, of more urgency at the moment is getting the data into a good state across all nodes. Is there a way to tell mnesia to replicate from a known-good node?

Allergen answered 13/11, 2013 at 4:53 Comment(0)
F
3

Mnesia is legendary about this issue. It's a huge PITA.

Looking at it from CAP theorem's point of view, most systems built with Mnesia end up being C-A (consistency-availability with no partition tolerance) systems. For most of the time you have (and heavily rely on) its hard consistency. Then a network partition happens... It's still available for writes, but these writes destroy consistency. And later on, Mnesia has no mechanism for automatic data repair.

Everyone who uses Mnesia in a cluster should familiarize themselves with these tradeoffs. Your problem is a clear sign that using Mnesia was a poor choice. Double so if this data is critical to you.

I too use Mnesia in such a way (sometimes we all need speed you know). But I make sure to only use it to store data that I can easily reconstruct. In general, if you need it stored on disk, Mnesia is no good, except for toy projects.

I make sure to always have this function at hand:

reinit_mnesia_cluster() ->
    rpc:multicall(mnesia, stop, []),
    AllNodes = [node() | nodes()],
    mnesia:delete_schema(AllNodes),
    mnesia:create_schema(AllNodes),
    rpc:multicall(mnesia, start, []).

Use it only after the network partition has been resolved and all nodes are reachable. This will erase all Mnesia replicas and start it anew. Again, if you can't live with what it does, then using Mnesia was a poor choice.

For important data that needs hard consistency, use SQL. For important data that needs availability, use Riak. For shared state that needs speed, use Redis. Mnesia is no replacement for these systems, although at first it does seem so.

Edit on 2014-11-16: Here is a much better article on the topic, explaining in detail what I said above https://medium.com/@jlouis666/mnesia-and-cap-d2673a92850

Fusion answered 1/1, 2014 at 8:25 Comment(11)
I'm almost sure you are not answering the question: "Is there a way to tell mnesia to replicate from a known-good node?"Ween
Still, I'm trying to be helpful to someone who has a problem. In this case, answering the immediate question with "no" won't help as much.Fusion
Also, I'm answering the question "One replicated mnesia table has become out-of-sync"Fusion
I see you are trying be helpful but One replicated mnesia table has become out-of-sync is not question but indicative sentence. And no is not right answer anyway. I'm convinced there is the way but it is complicated and I don't know exact procedure right now.Ween
I am (almost) sure that there is no good way. At least none that guarantees consistency and/or uninterrupted service while executing it. And if you don't need that, then you can just fetch all the records of that node, wipe the mnesia cluster with the above function, and put everything back in. Obviously, the OP is not asking for such a hackish solution. And yes there might be some very complicated solution, but not one that is already implemented as a "just call this function".Fusion
Your solution doesn't seem uninterrupted service to me so why you complain about it?Ween
let us continue this discussion in chatFusion
Here is a much better article on the topic, explaining in detail what I said above medium.com/@jlouis666/mnesia-and-cap-d2673a92850Fusion
Nice article, but you are still not answering question. You are answering some another question, which is not question asked.Ween
What do you suggest when should someone use Mnesia? You are saying that even for speed purposes Redis should be used..Ronel
As I said in the original answer. I too use mnesia, mostly as a process registry. I recommend having a look at syn: github.com/ostinelli/syn This is a nice project that mostly does (out-of-the-box) all the hard things when a network partition happens.Fusion
G
1

Honestly, I think the cleanest way to get an out-of-sync Mnesia to replicate from a known good node is to shut down the application on the bad node, and delete all its Mnesia database files, then do the following.

Write an escript that starts Mnesia up standalone using the "bad" node name and Mnesia directory, replicates the tables from a known good node, and shuts Mnesia down. Run that escript on the bad node.

The act of replicating the tables and shutting Mnesia down gracefully puts the node back in sync with the cluster. Then, when you start the application up on the bad node, it will join up and stay in sync with the cluster.

Of course, this description lacks precise details, but that's the gist of it. There are surely less brute force ways of doing this, but unless you have massive amounts of data to replicate, I think this way is the quickest and cleanest.

Giffy answered 20/6, 2015 at 3:26 Comment(2)
thanks for the answer but one thing that did not make sense to me is that when you shut mnesia down on the bad node and by the time you restart the application on the bad node, others node would be again out-of-sync assuming there were insert/delete operations on them. thanks!Tramway
It's not a problem for Mnesia if a node has been shut down cleanly and then comes up and joins the cluster. The node that just came up will get up to date with the good node. The problem just happens when Mnesia has detected a partition, that is, the tables on the nodes have diverged. If a node has not diverged but has been down, it is ok and gets itself up to date. While it is doing that, Mnesia is frozen on the bad node so that writes to the bad node can't happen. Hope that makes sense.Giffy

© 2022 - 2024 — McMap. All rights reserved.