Apache Ignite hangs on startup
Asked Answered
P

2

12

We use apache ignite v2.2 as hibernate 2nd level cache in grails application. We have 4 nodes cluster with 10G RAM each. The first node starts ok. But subsequent hangs. Sometimes 2nd sometimes 3rd or 4th. Also successful startups happen but very rare. App hangs always in the same place:

"host-startStop-1" #45 daemon prio=5 os_prio=0 tid=0x00007f7cac004800 nid=0x3d44 waiting on condition [0x00007f7cfdd81000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
        at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:216)
        at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:158)
        at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:150)
        at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.onKernalStart(GridCachePartitionExchangeManager.java:551)
        at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStart(GridCacheProcessor.java:843)
        at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1040)
        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1896)
        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1648)
        - locked <0x00000007890a1198> (a org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
        at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1076)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:596)
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:520)
        at org.apache.ignite.Ignition.start(Ignition.java:322)

All other nodes are locked during this process. Configuration:

IgniteConfiguration configuration = new IgniteConfiguration()
        List<CacheConfiguration> cacheConfigurations = []
        for (String name : caches) {
            CacheConfiguration cacheConfiguration = new CacheConfiguration<>()
            cacheConfiguration.setCacheMode(CacheMode.REPLICATED)
            cacheConfiguration.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL)
            cacheConfiguration.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_ASYNC)
            cacheConfiguration.setName(name)
            cacheConfiguration.onheapCacheEnabled =  true
            cacheConfiguration.evictionPolicy = new LruEvictionPolicy()
            cacheConfiguration.memoryPolicyName = MEMORY_POLICY
            cacheConfigurations.add(cacheConfiguration)
        }
        for (String name : ['org.hibernate.cache.spi.UpdateTimestampsCache',
                            'org.hibernate.cache.internal.StandardQueryCache']) {
            CacheConfiguration cacheConfiguration = new CacheConfiguration<>()
            cacheConfiguration.setCacheMode(CacheMode.REPLICATED)
            cacheConfiguration.setAtomicityMode(CacheAtomicityMode.ATOMIC)
            cacheConfiguration.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_ASYNC)
            cacheConfiguration.setName(name)
            cacheConfiguration.onheapCacheEnabled =  true
            cacheConfiguration.evictionPolicy = new LruEvictionPolicy()
            cacheConfiguration.memoryPolicyName = MEMORY_POLICY
            cacheConfigurations.add(cacheConfiguration)
        }
        configuration.setCacheConfiguration(cacheConfigurations.toArray(new CacheConfiguration[cacheConfigurations.size()]))
        configuration.peerClassLoadingEnabled = true
        configuration.igniteInstanceName = Constants.IGNITE_GRID
        configuration.gridLogger = new Slf4jLogger()
        MemoryConfiguration memoryConfiguration = new MemoryConfiguration()
        memoryConfiguration.defaultMemoryPolicySize = 1 * 1024 * 1024 * 1024l
        MemoryPolicyConfiguration l2CachePolicy = new MemoryPolicyConfiguration()
        l2CachePolicy.name = MEMORY_POLICY
        l2CachePolicy.setMaxSize(4 * 1024 * 1024 * 1024l)
        l2CachePolicy.pageEvictionMode = DataPageEvictionMode.RANDOM_LRU
        memoryConfiguration.setMemoryPolicies(l2CachePolicy)
        configuration.memoryConfiguration = memoryConfiguration
        int[] eventTypes = new int[1]
        eventTypes[0] = EventType.EVT_NODE_FAILED
        configuration.includeEventTypes = eventTypes
        Map<IgnitePredicate<? extends Event>, int[]> listeners = new HashedMap()
        listeners.put(new NodeFailedEventListener(), eventTypes)
        configuration.localEventListeners = listeners
        TcpCommunicationSpi commSpi = new TcpCommunicationSpi()
        commSpi.slowClientQueueLimit = 1000
        commSpi.messageQueueLimit = 5000
        configuration.communicationSpi = commSpi
        TcpDiscoverySpi discoverySpi = new TcpDiscoverySpi()
        configuration.discoverySpi = discoverySpi
        if (grailsApplication.config.grails?.plugin?.awssdk?.accessKey && Env.igniteS3Bucket) {
            TcpDiscoveryS3IpFinder awsIpFinder = new TcpDiscoveryS3IpFinder()
            awsIpFinder.setBucketName(Env.igniteS3Bucket)
            AWSCredentials awsCredentials = new BasicAWSCredentials(grailsApplication.config.grails.plugin.awssdk.accessKey,
                    grailsApplication.config.grails.plugin.awssdk.secretKey)
            awsIpFinder.setAwsCredentials(awsCredentials)
            discoverySpi.ipFinder = awsIpFinder
        } else {
            TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder()
            ipFinder.setAddresses(["127.0.0.1:47500"])
            discoverySpi.ipFinder = ipFinder
        }
        configuration.classLoader = grailsApplication.classLoader
        ignite = Ignition.start(configuration)

EDIT

Full thread dump of failed node

Full thread dump of succeed node

Priority answered 22/12, 2017 at 7:32 Comment(2)
It's impossible to understand anything from thread dump of one thread. Please share full logs and thread dumps from all nodesFlorentinoflorenza
I added full thread dumps of failed and succeeded nodesPriority
F
1

If you want to run more than 1 node on one physical machine, I would recommend configuring MemoryConfiguration(because by default, in version 2.2 Ignite will require 80% of physical RAM for one node) or update to version 2.3(default value was reduced to 20%)

Florentinoflorenza answered 26/12, 2017 at 8:30 Comment(5)
I have only 1 node per machine. And I have configured MemoryConfiguration.Priority
then please run Ignite in non-Quiet mode(-DIGNITE_QUIET=false or "-v") and share full logsFlorentinoflorenza
Don't know how to get full logs. -DIGNITE_QUIET=false doesn't produce any logsPriority
@DmitryS that's an argument to ignite.sh or ignite.bat. Also take a look at logs/Sefton
@Sefton We use embedded ignite as hibernate 2nd level cache.Priority
S
0

Can you please try with caches that don't contain period '.' in their names? This is known to cause delayed problems.

Sefton answered 17/1, 2018 at 17:54 Comment(2)
No. We use ignite as hibernate 2nd level cache. We can't choose cache names.Priority
@DmitryS but can you at least try if removing these fixes the problem?Sefton

© 2022 - 2024 — McMap. All rights reserved.