RabbitMQ inconsistent cluster
Asked Answered
C

1

7

Few questions about RabbitMQ v3.1.5 clustering. I have a cluster with 2 nodes, rabbitmq.config is like this on both nodes:

[
  {rabbit, [
    {cluster_nodes, {['rabbit@rmq01', 'rabbit@rmq02'], ram}},
    {tcp_listeners, [5674]}
  ]}
].

I already seen issue like this, and now I'm watching it again: When sometimes all cluster is shutting down, in case second node (rmq02) starts before first (rmq01), it 'forgets' about rmq01:

[root@rmq2 rabbitmq]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@rmq2' ...
[{nodes,[{disc,['rabbit@rmq2']}]},
 {running_nodes,['rabbit@rmq2']},
 {partitions,[]}]
...done.

After this first node (rmq01) can not start due to rmq2 disagrees about clustering:

{"init terminating in do_boot",{rabbit,failure_during_boot,{error,{inconsistent_cluster,"Node 'rabbit@rmq1' thinks it's clustered with node 'rabbit@rmq2', but 'rabbit@rmq2' disagrees"}}}}

I've tried to add rmq01 to rmq02, but seems I have to stop_app before this:

[root@rmq2 rabbitmq]# rabbitmqctl join_cluster rabbit@rmq1
Clustering node 'rabbit@rmq2' with 'rabbit@rmq1' ...
Error: mnesia_unexpectedly_running

Here I see that rmq02 forgot about rmq01:

[root@rmq2 ~]# cat /var/lib/rabbitmq/mnesia/rabbit\@rmq2/cluster_nodes.config 
{['rabbit@rmq2'],['rabbit@rmq2']}.

Meanwhile on rmq01 (correct configuration):

[root@rmq1 ~]# cat /var/lib/rabbitmq/mnesia/rabbit\@rmq1/cluster_nodes.config 
{['rabbit@rmq1','rabbit@rmq2'],['rabbit@rmq1']}.

Questions:

  1. Is it normal rmq02 forgets about rmq01, or I have some missconfiguration? Why is this happening?
  2. In case it is ok, is it possible to fix up cluster health without rmq02 downtime (I mean without stop_app)?
Cryptozoite answered 9/1, 2014 at 10:54 Comment(0)
C
16

I've found way to resolve question #2, to fix up cluster health with no downtime, we need to remove all mnesia data on inconsistent node:

[root@rmq01 ~]# rm -rf /var/lib/rabbitmq/mnesia/

[root@rmq01 ~]# service rabbitmq-server start
Starting rabbitmq-server: SUCCESS
rabbitmq-server.
[root@rmq01 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@rmq01' ...
[{nodes,[{disc,['rabbit@rmq02']},{ram,['rabbit@rmq01']}]},
 {running_nodes,['rabbit@rmq02','rabbit@rmq01']},
 {partitions,[]}]
...done.

I still do not understand how to avoid this scenario (question #1), maybe some mnesia customisations will help.

Cryptozoite answered 10/1, 2014 at 8:48 Comment(2)
Note: for windows the mnesia folder is in C:\Users\<username>\AppData\Roaming\RabbitMQ\db. I deleted that folder on a node I couldn't get back up and it worked. Thanks!Ferryman
I was seeing error,corrupt_cluster_status_files, in /var/log/rabbitmq/startup_log. Removing the mnesia directory and restarting the service fixed the issue.Catnip

© 2022 - 2024 — McMap. All rights reserved.