JGroups nodes on EC2 not talking although they see each other

I'm trying to use Hibernate Search so that all writes to the Lucene index from jgroupsSlave nodes are sent to the jgroupsMaster node, and then the Lucene index is shared back to the slaves with Infinispan. Everything works locally, but while the nodes discover each other up on EC2, they don't seem to be communicating.

They are both sending each other are-you-alive messages.

# master output sample
86522 [LockBreakingService,localCache,archlinux-37498] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
86523 [LockBreakingService,LuceneIndexesLocking,archlinux-37498] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
87449 [Timer-4,luceneCluster,archlinux-37498] DEBUG org.jgroups.protocols.FD  - sending are-you-alive msg to archlinux-57950 (own address=archlinux-37498)
87522 [LockBreakingService,localCache,archlinux-37498] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
87523 [LockBreakingService,LuceneIndexesLocking,archlinux-37498] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0

# slave output sample
85499 [LockBreakingService,localCache,archlinux-57950] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
85503 [LockBreakingService,LuceneIndexesLocking,archlinux-57950] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
86190 [Timer-3,luceneCluster,archlinux-57950] DEBUG org.jgroups.protocols.FD  - sending are-you-alive msg to archlinux-37498 (own address=archlinux-57950)
86499 [LockBreakingService,localCache,archlinux-57950] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
86503 [LockBreakingService,LuceneIndexesLocking,archlinux-57950] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0

Security Groups

I have two jars, one for master, and one for slave, that I am running on their own EC2 instances. I can ping each instance, from the other, and they are both in the same security group, which defines the following rules for communication between any machines in my group.

ALL ports for ICMP 0-65535 for TCP 0-65535 for UDP

So I don't think it is a security group configuration problem.

hibernate.properties

# there is also a corresponding jgroupsSlave
hibernate.search.default.worker.backend=jgroupsMaster
hibernate.search.default.directory_provider = infinispan
hibernate.search.infinispan.configuration_resourcename=infinispan.xml
hibernate.search.default.data_cachename=localCache
hibernate.search.default.metadata_cachename=localCache

infinispan.xml

<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
            xmlns="urn:infinispan:config:5.1">
    <global>
        <transport clusterName="luceneCluster" transportClass="org.infinispan.remoting.transport.jgroups.JGroupsTransport">
            <properties>
                <property name="configurationFile" value="jgroups-ec2.xml" />
            </properties>
        </transport>
    </global>

    <default>
        <invocationBatching enabled="true" />
        <clustering mode="repl">

        </clustering>
    </default>

    <!-- this is just so that each machine doesn't have to store the index
         in memory -->
    <namedCache name="localCache">
        <loaders passivation="false" preload="true" shared="false">
            <loader class="org.infinispan.loaders.file.FileCacheStore" fetchPersistentState="true" ignoreModifications="false" purgeOnStartup="false">
                <properties>
                    <property name="location" value="/tmp/infinspan/master" />
                    <!-- there is a corresponding /tmp/infinispan/slave in
                    the slave config -->
                </properties>
            </loader>
        </loaders>
    </namedCache>
</infinispan>

jgroups-ec2.xml

<config xmlns="urn:org:jgroups" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.2.xsd">
    <TCP
            bind_addr="${jgroups.tcp.address:127.0.0.1}"
            bind_port="${jgroups.tcp.port:7800}"
            loopback="true"
            port_range="30"
            recv_buf_size="20000000"
            send_buf_size="640000"
            max_bundle_size="64000"
            max_bundle_timeout="30"
            enable_bundling="true"
            use_send_queues="true"
            sock_conn_timeout="300"
            enable_diagnostics="false"

            bundler_type="old"

            thread_pool.enabled="true"
            thread_pool.min_threads="2"
            thread_pool.max_threads="30"
            thread_pool.keep_alive_time="60000"
            thread_pool.queue_enabled="false"
            thread_pool.queue_max_size="100"
            thread_pool.rejection_policy="Discard"

            oob_thread_pool.enabled="true"
            oob_thread_pool.min_threads="2"
            oob_thread_pool.max_threads="30"
            oob_thread_pool.keep_alive_time="60000"
            oob_thread_pool.queue_enabled="false"
            oob_thread_pool.queue_max_size="100"
            oob_thread_pool.rejection_policy="Discard"
            />
    <S3_PING secret_access_key="removed_for_stackoverflow" access_key="removed_for_stackoverflow" location="jgroups_ping" />

    <MERGE2 max_interval="30000"
            min_interval="10000"/>
    <FD_SOCK/>
    <FD timeout="3000" max_tries="3"/>
    <VERIFY_SUSPECT timeout="1500"/>
    <pbcast.NAKACK2
            use_mcast_xmit="false"
            xmit_interval="1000"
            xmit_table_num_rows="100"
            xmit_table_msgs_per_row="10000"
            xmit_table_max_compaction_time="10000"
            max_msg_batch_size="100"
            become_server_queue_size="0"/>
    <UNICAST2
            max_bytes="20M"
            xmit_table_num_rows="20"
            xmit_table_msgs_per_row="10000"
            xmit_table_max_compaction_time="10000"
            max_msg_batch_size="100"/>
    <RSVP />
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="400000"/>
    <pbcast.GMS print_local_addr="false" join_timeout="7000" view_bundling="true"/>
    <UFC max_credits="2000000" min_threshold="0.10"/>
    <MFC max_credits="2000000" min_threshold="0.10"/>
    <FRAG2 frag_size="60000"/>
</config>

I copied this directly from the most recent infinispan-core distribution (5.2.0.Beta3, but I also tried the 5.1.4 I think it was). The only thing I changed was replaced their s3_ping with mine, but again I see the nodes writing to s3, and they find each other so I don't think that is the problem. I'm also starting master/slave with their environment vars for jgroups.tcp.address set to their private IP address. I also tried a few configs that were greatly simplified without any success.

Any ideas in what the problem could be? I've spent a few days playing around with it, and it is driving me crazy. I think it must be something with the jgroups config since it works locally and just isn't being able to talk on EC2.

Any other information you guys want to help figure this out?

Recommended topics

Hot tags