Why does Hazelcast expect multicast when I enabled tcp-ip?
Asked Answered
W

1

7

I'm trying to set up a 2 machine Hazelcast cluster, and can't use multicasting. Here's my xml file I'm using for configuration:

<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config http://www.hazelcast.com/schema/config/hazelcast-config-3.9.xsd" xmlns="http://www.hazelcast.com/schema/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <network>
        <port auto-increment="true">5701</port>
        <join>
            <multicast enabled="false">
            </multicast>
            <tcp-ip enabled="true">
                <member>10.18.7.4</member>
                <member>10.18.14.63</member>
            </tcp-ip>
        </join>
    </network>
</hazelcast>

And I'm instantiating Hazelcast as such:

        Config config = new FileSystemXmlConfig(xmlConfigFile);
        HazelcastInstance hz  = Hazelcast.newHazelcastInstance(config);

When I start up each one, I can see the connection get made, then it shuts down the node. Here is the relevant lines from the log file. The logs are the same for both machines, just the ips are different. I've added comments (after ->) to make it easier to read.

WARNING: Name of the hazelcast schema location is incorrect, using default -> Presumable no issue here

INFO: [LOCAL] [dev] [3.12] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [10.18.14.63, 10.18.7.4]

INFO: [LOCAL] [dev] [3.12] Picked [10.18.14.63]:5702, using socket ServerSocket[addr=/0.0.0.0,localport=5702], bind any local is true

INFO: [10.18.14.63]:5702 [dev] [3.12] Hazelcast 3.12 (20190409 - 915d83a) starting at [10.18.14.63]:5702 -> Great, we're starting Hazelcast on this machine. 

INFO: [10.18.14.63]:5702 [dev] [3.12] Starting 2 partition threads and 3 generic threads (1 dedicated for priority tasks) -> Looking good

INFO: [10.18.14.63]:5702 [dev] [3.12] [10.18.14.63]:5702 is STARTING -> Ok looks like we've started. 

INFO: [10.18.14.63]:5702 [dev] [3.12] Connecting to /10.18.7.4:5702, timeout: 10000, bind-any: true -> Trying to connect to the other machine

INFO: [10.18.14.63]:5702 [dev] [3.12] Connecting to /10.18.7.4:5703, timeout: 10000, bind-any: true -> Still trying to connect to the other machine

INFO: [10.18.14.63]:5702 [dev] [3.12] Initialized new cluster connection between /10.18.14.63:44251 and /10.18.14.63:5701 -> Ok started a cluster connection on this machine. 

INFO: [10.18.14.63]:5702 [dev] [3.12] Initialized new cluster connection between /10.18.14.63:38941 and /10.18.7.4:5701 -> Great, started a connection with the other machine

SEVERE: [10.18.14.63]:5702 [dev] [3.12] Node could not join cluster. A Configuration mismatch was detected: Incompatible joiners! expected: multicast, found: tcp-ip Node is going to shutdown now! -> This is the error I don't understand. 

Apr 22, 2019 6:57:44 PM com.hazelcast.instance.Node
WARNING: [10.18.14.63]:5702 [dev] [3.12] Terminating forcefully...

Apr 22, 2019 6:57:44 PM com.hazelcast.instance.Node
INFO: [10.18.14.63]:5702 [dev] [3.12] Shutting down connection manager...

My first question is, if I set multicast enabled="false" in the xml config file, why do I get this message Node could not join cluster. A Configuration mismatch was detected: Incompatible joiners! expected: multicast, found: tcp-ip Node is going to shutdown now!, Then it shuts down?

My second question is, how do I properly configure the xml file to create a 2 node cluster using tcp-ip, not multicasting?

Thank you for your help.

Winnifredwinning answered 22/4, 2019 at 19:20 Comment(4)
Node on 10.18.14.63 is starting at port 5702. That makes me think there's another Hazelcast node listening at port 5701 on 10.18.14.63. Can you confirm that?Churl
@Churl There should only be one Hazelcast node on 10.18.14.63. You raise an interesting question about ports, which I should look into. Perhaps setting auto-increment to false for the ports will yield some insights.Winnifredwinning
I think @Churl is right, you seem to be running another Hazelcast member (with the multicast enabled configuration) on the same machine. Then, the the members try to form the cluster, but they have incompatible joining methods (one is "tcp-ip" and the other is "multicast"). That's why you see this error.Interrupted
Thanks all for your comments and help. I just went ahead and implemented a solution in memcached. @RafałLeszko that thought crossed my mind, but a ps aux | grep hazelcast didn't show any other Hazelcast members running on the same machine. If I have time to look into this more I'll try to figure it out and post an answer here, but for now I need to move ahead and ship this. Thanks again.Winnifredwinning
P
0

You must make sure that all members of the cluster support the same discovery mechanisms. Make sure you only have tcp joiner enabled on both nodes. That should be enough to solve your problem.

Alternatively, you can try removing the “multicast” section from the xml on both nodes and leaving the tcp only (I have not tried this but I think it should be similar to disabling the multicast section).

You can read in this issue that this error is common when there is a configuration error in one of the cluster nodes.

Update added by Amit Kumar:

I noticed that some default hazelcast configuration is running, i.e. from the hazelcast .jar hazelcast.xml loads first and then after running mine configuration hazelcast.xml. The network port "5721" is indicated in the custom hazelcast.xml but an instance is still started on port "5701" and then "5702".

Members {size:1, ver:1} [
    Member [192.168.1.102]:5701 - ecdb61e1-ac24-45dc-826d-9d807fed5f71 this
]

Members {size:1, ver:1} [
    Member [192.168.1.102]:5721 - 96c55f82-eefd-401f-9aca-a51e288ccb2a this
]
Postprandial answered 27/6, 2020 at 15:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.