First off, let me briefly clarify the flags meaning for posterity. --registry
does not influence leader election, it specifies the persistence strategy for the registry (where Mesos tracks data that should be carried over failover). The in_memory
value should not be used in production, it may even be removed in the future.
Leader election is performed by zookeeper. According to your log, you use the following zookeeper cluster: zk://10.1.69.172:2181,10.1.9.139:2181,10.1.79.211:2181/mesos
.
Now, from your log, the cluster did not fail to elect the master, it actually did it twice:
I0313 18:35:28.257139 3253 master.cpp:1710] The newly elected leader is [email protected]:5050 with id edd3e4a7-ede8-44fe-b24c-67a8790e2b79
...
I0313 18:35:36.074087 3257 master.cpp:1710] The newly elected leader is [email protected]:5050 with id c4fd7c4d-e3ce-4ac3-9d8a-28c841dca7f5
I can't say why exactly the leader was elected twice, for that I would need logs from 2 other masters as well. According to your log, the last elected master is on 10.1.9.139:5050
, which is most probably not the one you provided the log from.
One suspicious thing I see in the log is that master IDs differ for the same IP:port. Do you have an idea why?
I0313 18:35:28.237251 3244 master.cpp:374] Master 24ecdfff-2c97-4de8-8b9c-dcea91115809 (10.1.69.172) started on 10.1.69.172:5050
...
I0313 18:35:28.257139 3253 master.cpp:1710] The newly elected leader is [email protected]:5050 with id edd3e4a7-ede8-44fe-b24c-67a8790e2b79