WARNING: the background info is pretty long. Skip to the bottom if you think you need the question before the background info. Appreciate the time this is gonna take!
I've been all over the web (read google) and I have not found a good answer. YES, there are plenty of links and references to the Mnesia documentation on the erlang.org site but even those links suffer from version-itis.
So in the simplest case where the node() you are currently connected to is the same as the owner of the table set then the backup/restore is going to work. For example:
$ erl -sname mydatabase
> mnesia:start().
> mnesia:create_schema(...).
> mnesia:create_table(...).
> mnesia:backup("/tmp/backup.bup").
> mnesia:restore("/tmp/backup.bup", [{default_op, recreate_tables}]).
Hey this works great!
However, if the database is actually running on a remote node() or a remote node() on a remote mating then you have to initiate the backup this way:
$ erl -sname mydbadmin
> rpc:call(mydatabase@host, mnesia, backup, ["/tmp/backup.bup"]).
> rpc:call(mydatabase@host, mnesia, restore, ["/tmp/backup.bup", [{default_op, recreate_tables}]]).
Of course this was simple too. Now here are the tricky things....
- Let's say that you are taking daily backups. And you mnesia database server dies and you are forced to replace the hardware. If you want to restore the DB as-is then you need to name the NEW hardware with the same name that it had previously and you also need to name the nodes the same.
- if you want to change the name of the hardware and/or the node()... or you want to restore on a different machine, then you need to go through the node_change process. (described here and in the mnesia docs)
But here is where things get complicated. While acquaintances of mine, who are erlang and mnesia experts suggest that mnesia's replication is severely flawed and that you should not use it (there are currently no alternatives that I know of and what are the chances that you are going to implement better version; not likely)
So you have two nodes() that are replicating ram and disc based tables. You have been maintaining a policy of backing up the database regularly with the standard backup using the default BackupMod. And one day a manager asks you to verify the backups. Only when you attempt to restore the database you get:
{atomic,[]}
And according to the documentation this means that there were no errors... and yet no tables were restored.
Not wanting to run the change_node procedure you remember that the node() and hostname must match so you change the hostname and the -sname param to match the machine where the data was backed up. This time however you get a strange error:
{aborted,{'EXIT',{aborted,{bad_commit,{missing_lock,mydatabase@otherhost}}}}}
Still not wanting to run the change_node procedure I quickly clone restore my server so that I have two similar machines. I name then appropriately to match the production servers. And I begin the restore process. Eureka! I now have real working data on the restore servers.
I'd like to say that this was the end of the road... but I have not asked a question yet and that the point of SO.... so here it is?
QUESTION: if I want to restore a backup which was taken from a cluster of replicated mnesia nodes, how do I modify the file (similar to the change_node procedure) so that the other nodes are either ignored or removed from the backup?
Asked slightly differently: How do I restore a replicated-multi-node() mnesia database on a single node()?