Astyanax client maximum connections per node?
Asked Answered
K

1

5

I am reading the data from Cassandra database using the Astyanax client.

I have around one million unique rows in a Cassandra database. I have a single cross colocation centre cluster with four nodes.

These are my four nodes:

  node1:9160
  node2:9160
  node3:9160
  node4:9160

I have KeyCaching enabled and SizeTieredCompaction strategy is enabled as well.

I have a client program which is multithreaded that will read the data from the Cassandra database using the Astyanax client and which I am running with 20 threads. If I am running my client program with 20 threads, then the performance of reading the data from Cassandra database degrades.

So the first thing that jumps to my mind is that there might be contention over connections to Cassandra (do they use a pool, if so how many connections are being maintained)? I am using the below code to make the connection using Astyanax client.

private CassandraAstyanaxConnection() {
    context = new AstyanaxContext.Builder()
    .forCluster(ModelConstants.CLUSTER)
    .forKeyspace(ModelConstants.KEYSPACE)
    .withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
        .setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)
    )
    .withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
        .setPort(9160)
        .setMaxConnsPerHost(1)
        .setSeeds("nod1:9160,node2:9160,node3:9160,node4:9160")
    )
    .withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
        .setCqlVersion("3.0.0")
        .setTargetCassandraVersion("1.2"))
    .withConnectionPoolMonitor(new CountingConnectionPoolMonitor())
    .buildKeyspace(ThriftFamilyFactory.getInstance());

    context.start();
    keyspace = context.getEntity();

    emp_cf = ColumnFamily.newColumnFamily(
        ModelConstants.COLUMN_FAMILY,
        StringSerializer.get(),
        StringSerializer.get());
}

Do I need to make any sort of changes in the above code to improve the performance?

What does this method do?

   setMaxConnsPerHost(1)

Do I need to increase that to improve the performance? I have four nodes, so I should change that to 4?

And will the setMaxConns(20) method call? Do I need to add that as well to improve the performance? As I will be running my program with multiple threads.

Kolodgie answered 24/4, 2013 at 23:1 Comment(0)
L
9

For details on maxConnsPerHost/maxConns You may check this answer: setMaxConns and setMaxConnsPerHost in Astyanax client

And yes, maxConnsPerHost should be increased to achieve good performance. The optimal value depends on network topology, request replication factor, storage configuration, caching, read/write ratio, etc.

I don't think it's possible to achieve optimal performance for heavily loaded cluster without experiments and simulations.

For tasks with moderate load on Cassandra I usually use a rule of thumb:

maxConnsPerHost ~= <Number of cores per host>/<Replication factor> + 1

That is, for a cluster of 8-core boxes with replication factor 3, maxConnsPerHost should be around 4. This value is also a good starting point for experiments in heavy-load scenarios.

The motivation: a cluster of N nodes each having C cores has N * C cores total. To process request with replication factor R, R cores (of different nodes) are required. So, at every given moment the cluster can process up to N * C / R requests. It's a good idea to keep the amount of concurrent connections around this number. Divide it by N to calculate the number of connections per host. Add 1 spare connection per host for network latencies, etc. That's it.

Update: Simple client performance tuning:

  • Start with some maxConnsPerHost value
  • Simulate load and observe CPU usage and org.apache.cassandra.request->***Stage->pendingTasks JXM attributes
  • Increase maxConnsPerHost until pendingTasks starts to increase rapidly. This is probably the optimal value.
  • CPU load on cluster nodes should be around 50-70%. If it's much less - there's probably something wrong with server configuration.
Lifesaving answered 26/4, 2013 at 10:1 Comment(3)
Thanks Wildfire for the suggestion. Appreciated your help. And what about setMaxConns? What value we should set for that? What kind of logic we usually follow to decide that.?Kolodgie
@FarhanJamal: setMaxConns is only used with ConnectionPoolType.BAG, it's simply ignored in other implementations. If you use BAG connection pool, you may set this attribute to the maximum number of thread that might send requests to Cassandra simultaneously.Lifesaving
Thanks for the suggestion. In general what connection pool I should use? Meaning what connection pooling will allow me to have faster read performance. Currently, in my above example, I am using ConnectionPoolConfigurationImpl. Do you have any recommendation for that as well?Kolodgie

© 2022 - 2024 — McMap. All rights reserved.