Mesos cluster fails to elect master when using replicated_log
Asked Answered
P

2

8
  • Test environment: multi-node mesos 0.27.2 cluster on AWS (3 x masters, 2 x slaves, quorum=2).
  • Tested persistence with zkCli.sh and it works fine.
  • If i start the masters with --registry=in_memory, it works fine, master is elected, i can start tasks via Marathon.
  • If i use the default (--registry=replicated_log) the cluster fails to elect a master:

https://gist.github.com/mitel/67acd44408f4d51af192

EDIT: apparently the problem was the firewall. Applied an allow-all type of rule to all my security groups and now i have a stable master. Once i figure out what was blocking the communication i'll post it here.

Principal answered 13/3, 2016 at 19:52 Comment(0)
P
5

Discovered that mesos masters also initiate connections to other masters on 5050. After adding the egress rule to the master's security group, the cluster is stable, master election happens as expected. firewall rules

UPDATE: for those who try to build an internal firewall between the various components of mesos/zk/.. - don't do it. better to design the security as in Mesosphere's DCOS

Principal answered 17/3, 2016 at 17:26 Comment(1)
Yep, log replicas live in the same OS process with the master and communicate with each other using the same socket the master uses (TCP on 5050).Wan
W
1

First off, let me briefly clarify the flags meaning for posterity. --registry does not influence leader election, it specifies the persistence strategy for the registry (where Mesos tracks data that should be carried over failover). The in_memory value should not be used in production, it may even be removed in the future.

Leader election is performed by zookeeper. According to your log, you use the following zookeeper cluster: zk://10.1.69.172:2181,10.1.9.139:2181,10.1.79.211:2181/mesos.

Now, from your log, the cluster did not fail to elect the master, it actually did it twice:


I0313 18:35:28.257139  3253 master.cpp:1710] The newly elected leader is [email protected]:5050 with id edd3e4a7-ede8-44fe-b24c-67a8790e2b79
...
I0313 18:35:36.074087  3257 master.cpp:1710] The newly elected leader is [email protected]:5050 with id c4fd7c4d-e3ce-4ac3-9d8a-28c841dca7f5

I can't say why exactly the leader was elected twice, for that I would need logs from 2 other masters as well. According to your log, the last elected master is on 10.1.9.139:5050, which is most probably not the one you provided the log from.

One suspicious thing I see in the log is that master IDs differ for the same IP:port. Do you have an idea why?

I0313 18:35:28.237251  3244 master.cpp:374] Master 24ecdfff-2c97-4de8-8b9c-dcea91115809 (10.1.69.172) started on 10.1.69.172:5050
...
I0313 18:35:28.257139  3253 master.cpp:1710] The newly elected leader is [email protected]:5050 with id edd3e4a7-ede8-44fe-b24c-67a8790e2b79
Wan answered 13/3, 2016 at 21:5 Comment(6)
(To the last question)That master probably restarted and gave itself a new masterId. That's why the same IP:port might have different masterIds.Service
Thanks @rukletsov. The --registry indeed, should not influence the leader election. I just found it weird that with the same setup and in_memory persistence, the cluster manages to elect a stable master. I recreated the cluster, here are the logs: master1 master2 master3Principal
@mitelone, based on your logs, it looks like the problem is with quorum and synchronization between replicated_log replicas. Before I elaborate on reasons, could you please do a test for me? Could you please start all masters simultaneously (not with 4s delay as in your logs) OR put them under a supervisor, e.g. monit, and tell me whether the problem persists? Thanks!Wan
@Wan tested already with a supervisor and it behaves the same. Since it works with no firewall between nodes, i suppose it's a communication issue. Weird though since i was very careful with all the standard ports that i found documented for masters, slaves, zk and marathon.Principal
@Wan here is a gist with the firewall rules mesos_fw_terraformPrincipal
This is a bit strange. IIRC, log replica actors live in the same process with Mesos master and use same sockets for communication, i.e. TCP on port 5050 in your case. Mind monitoring traffic and find out what packages are filtered?Wan

© 2022 - 2024 — McMap. All rights reserved.