ApacheCurator distributed locking - performance
Asked Answered
S

0

6

We are currently evaluating apache-curator for distributed locking use case. Below is our test case:

public class Test { 
    private static CuratorFramework client = CuratorFrameworkFactory.newClient("zook.company.com", new ExponentialBackoffRetry(10,5));

    public static void main(String[] args) {
        client.start();
        int numLocks = 50000;
        int numThreads = 200;
        String[] keyPool = new String[numLocks];
        for (int i = 1; i <= numLocks; i++) {
            keyPool[i - 1] = String.valueOf(100 + i);
        }
        for (int i = 0; i < numThreads; i++) {
            Thread t = new Thread(new Job(numLocks, keyPool));
            t.setName("T" + (i + 1));
            t.start();
        }
    }

    private static class Job implements Runnable {
        private int      numLocks;
        private String[] keyPool;

        public Job(int numLocks, String[] keyPool) {
            this.numLocks = numLocks;
            this.keyPool = keyPool;
        }

        @Override
        public void run() {
            while (true) {
                int l = 0;
                int h = numLocks;
                String lockKey = keyPool[(new Random().nextInt(h - l) + l)];
                InterProcessMutex lock = new InterProcessMutex(client, "/"+lockKey);
                boolean acquired = false;
                String threadName = Thread.currentThread().getName();
                try {
                    long start = System.currentTimeMillis();
                    acquired = lock.acquire(0, TimeUnit.NANOSECONDS);
                    if (acquired) {
                        long end = System.currentTimeMillis();
                        System.out.println("lock acquired in "+ (end - start) + " ms");
                    } else {
                        System.out.println("failed to get lock");
                    }
                } catch (Exception e) {
                    System.out.println(e);
                } finally {
                    if (acquired) {
                        long start = System.currentTimeMillis();
                        lock.release();
                        long end = System.currentTimeMillis();
                        System.out.println("lock released in "+(end - start)+" ms");
                    }
                }
            }
        }
    }
}

The test runs on a 2 core/7.5G RAM machine with 2G Xmx. While the zookeeper instance (zook.company.com) is running on 4 core/15G RAM server with Xmx as 12G, maxClientCnxns=5000, tickTime=2000, initLimit=10 and syncLimit=5.

Both servers reside in the same AWS VPC.

On running the tests for 10 mins, we are getting lock acquisition time as 80 ms for more than 95% lock attempts. While max time taken for lock acquisition was 340 ms. Have been trying different combinations of number of threads and number of locks but the times are always on the higher side.

Not able to find if anything wrong anywhere? Because the times seem to be too high. Any clues??

Spevek answered 24/4, 2015 at 16:7 Comment(10)
I modified the test so that it uses an in memory testing server. If I run it on my MacBook Pro with nothing special, I get average lock times in 10-20ms. Is the test in the same VPC as the server? Maybe it's a server config? gist.github.com/Randgalt/736a682d5b896f93ab69Opera
FYI - I just ran the same test with an external ZK instance running on my same MBP and got reasonable values as well.Opera
Yes, the test and the zookeeper server are in the same VPC. I have not changed any config other than those mentioned in the question. Also, I tried running the test on zookeeper server (and providing localhost instead of zook.company.com) but that again did not give any significant performance improvement (75 ms).Spevek
Are you able to reproduce my numbers on a local machine? Also, what versions of ZK and Curator are you using?Opera
I am afraid, but no, the numbers are around 50-60ms on my local machine (Macbook Air). Versions I am using are Zookeeper: 3.4.6, Curator: 2.7.1 (Framework and client both). Here are my configurations for VPC test: gist.github.com/amdalal/42c8a98293dbeabac884Spevek
Well, the time came down to 30ms on using Zookeeper-3.3.6. Everything else remaining same!!Spevek
30ms is a lifetime on a computer. @Opera I am seeing the same poor performance that Amit has observed.Roswald
30ms for a distributed database transaction? I'm not sure what you're thinking but that seems pretty fast to me.Opera
@Opera That was 30ms for your in-memory server. In real life, we are seeing 80ms to 350ms to acquire a lock that should have no contention. What is the expected performance?Roswald
It depends how many instances you have. If you have the recommended 5, each write has to be confirmed at 3 instances. I forget ZK's latency numbers for that. It also depends what else is going on with the ZK servers. Additionally, remember that all writes go through the leader instance in ZooKeeper.Opera

© 2022 - 2024 — McMap. All rights reserved.