how do I remove an extra node
Asked Answered
N

4

10

I have a group of erlang nodes that are replicating their data through Mnesia's "extra_db_nodes"... I need to upgrade hardware and software so I have to detach some nodes as I make my way from node to node.

How does one remove a node and still preserve the data that was inserted?

[update] removing nodes is as important as adding them. Over time as your cluster grows it must also contract. If not then Mnesia is going to be busy trying to send data to nonexistent nodes filling up queues and keeping the network busy.

[final update] after pouring through the erlang/mnesia source code I was able to determine that it is not possible to completely disassociate nodes. While del_table_copy removes the linkage between tables it is incomplete. I would close this question but none of the close descriptions are adequate.

Nugent answered 4/5, 2009 at 12:42 Comment(4)
chances of erlang hackers passing here? low. But I found the question intriguing, and am off looking at erlang and mnesia, so when I've learnt it I might pass back in a year or two and give answering a shot! thanks for the interesting postCinchonize
well, there are 19 mnesia questions, so the odds aren't THAT low. The more niche the question, the longer you may have to wait to get an answer, that's all.Hanoi
It's just a matter of time before I crack open the code and look for myself. I will have plenty of time next week when my layoff is final.Nugent
When you say that you're replicating data 'through Mnesia's "extra_db_nodes"' value, that's not really correct. extra_db_nodes just tells mnesia which other nodes to connect to - and it shouldn't be used except when you start a new empty database node. In normal operation (after copying the schema to a new node), extra_db_nodes is unnecessary because the schema also tells mnesia which nodes to connect to.Sip
N
5

I wish I had found this a long time ago: http://weblambdazero.blogspot.com/2008/08/erlang-tips-and-tricks-mnesia.html

basically, with a properly functioning cluster....

  • login to the cluster to be removed

  • stop mnesia

    mnesia:stop().
    
  • login to a different node on the cluster

  • delete the schema

    mnesia:del_table_copy(schema, [email protected]).
    
Nugent answered 5/10, 2012 at 3:6 Comment(0)
L
3

I'm extremely late to the party, but came across this info in the doc when looking for a solution to the same problem:

"The function call mnesia:del_table_copy(schema, mynode@host) deletes the node 'mynode@host' from the Mnesia system. The call fails if mnesia is running on 'mynode@host'. The other mnesia nodes will never try to connect to that node again. Note, if there is a disc resident schema on the node 'mynode@host', the entire mnesia directory should be deleted. This can be done with mnesia:delete_schema/1. If mnesia is started again on the the node 'mynode@host' and the directory has not been cleared, mnesia's behaviour is undefined." (http://www.erlang.org/doc/apps/mnesia/Mnesia_chap5.html#id74278)

I think the following might do what you desire:

AllTables = mnesia:system_info(tables),
DataTables = lists:filter(fun(Table) -> Table =/= schema end,
                          AllTables),

RemoveTableCopy = fun(Table,Node) ->
  Nodes = mnesia:table_info(Table,ram_copies) ++
          mnesia:table_info(Table,disc_copies) ++
          mnesia:table_info(Table,disc_only_copies),
  case lists:member(Node,Nodes) of
    true -> mnesia:del_table_copy(Table,Node);
    false -> ok
  end
end,

[RemoveTableCopy(Tbl,'gone@gone_host') || Tbl <- DataTables].

rpc:call('gone@gone_host',mnesia,stop,[]),
rpc:call('gone@gone_host',mnesia,delete_schema,[SchemaDir]),
RemoveTablecopy(schema,'gone@gone_host').

Though, I haven't tested it since my scenario is slightly different.

Leaved answered 19/1, 2011 at 23:57 Comment(0)
J
1

I've certainly used this method to perform this (supporting the mnesia:del_table_copy/2 use). See removeNode/1 below:

-module(tool_bootstrap).

-export([bootstrapNewNode/1, closedownNode/0,
     finalBootstrap/0, removeNode/1]).

-include_lib("records.hrl").

-include_lib("stdlib/include/qlc.hrl").

bootstrapNewNode(Node) ->
    %% Make the given node part of the family and start the cloud on it
    mnesia:change_config(extra_db_nodes, [Node]),
    %% Now make the other node set things up
    rpc:call(Node, tool_bootstrap, finalBootstrap, []).

removeNode(Node) ->
    rpc:call(Node, tool_bootstrap, closedownNode, []),
    mnesia:del_table_copy(schema, Node).

finalBootstrap() ->
    %% Code removed to actually copy over my tables etc...
    application:start(cloud).

closedownNode() ->
    application:stop(cloud), mnesia:stop().
Joappa answered 8/6, 2009 at 23:11 Comment(3)
while this code may have appeared to work it does not cleanup all of the data. del_table_copy does not remove the node from the extra_db_node list. In fact there is no code in the source that completely removes the node.Nugent
Yes you're right. I removed all of the code specific to my application for clarity...Joappa
Ah - yes. I should read the original question. My solution certainly had the effect I desired - that of no longer having my (deleted) node host a replicated copy of the data. But I never checked what extra_db_nodes was set to after the change...Joappa
S
0

If you have replicated the table (added table copies) on nodes other than the one you're removing, then you're already fine - just remove the node.

If you wanted to be slightly tidier you'd delete the table copies from the node you're about to remove first via mnesia:del_table_copy/2.

Generally, mnesia gracefully handles node loss and detects node rejoin (rebooted nodes obtain new table copies from nodes that kept running, nodes that didn't reboot are detected as a network partition event). Mnesia does not consume CPU or network traffic for nodes that have gone down. I think, though I haven't confirmed it in the source, mnesia won't reconnect to nodes that have gone down automatically - the node that goes down is expected to reboot (mnesia) and reconnect.

mnesia:add_table_copy/3, mnesia:move_table_copy/3 and mnesia:del_table_copy/2 are the functions you should look at for live schema management.

The extra_db_nodes parameter should only be used when initialising a new DB node - once a new node has a copy of the schema it doesn't need the extra_db_nodes parameter.

Sip answered 3/6, 2009 at 22:6 Comment(5)
I'm on the fence with this answer. I like the general information, however, it's not current. The three methods you mention are not included in the R13B release. A search of the R13A code did not reveal any similar methods.Nugent
Continuing my search of the source I found some indication that there is a call to mnesia_controller:add_list/2 that is used when adding the extra node. There is a comment that suggests calling mnesia_recover:disconnect_nodes/1, however, that method does not exist anywhere and might simply be a typo; mnesia_recover:disconnect/1 exists.Nugent
I said delete_table_copy instead of del_table copy, but apart from that those methods are present, documented and current. You shouldn't have to disconnect nodes by hand - mnesia handles node disconnection by itself. Just turn off the unwanted nodes. Or use net_kernel:disconnect/1 to do it forcibly.Sip
del_table_copy does not remove the node from the extra_db_nodes list.Nugent
Sure, but why are they in the extra_db_nodes list in the first place? You only need extra_db_nodes while joining the cluster - after that it's more of a hinderance than anything else. You can change_config to alter extra_db_nodes at runtime, but why do that? You shouldn't be specifying extra_db_nodes in normal operation, so your problem is not how to delete thing from extra_db_nodes when removing nodes, but how to avoid using extra_db_nodes at any point after a new node joins a cluster.Sip

© 2022 - 2024 — McMap. All rights reserved.