Optimal JVM settings for Cassandra
Asked Answered
W

2

10

I have a 4 node cluster with 16 core CPU and 100 GB RAM on each box (2 nodes on each rack).

As of now, all are running with default JVM settings of Cassandra (v2.1.4). With this setting, each node uses 13GB RAM and 30% CPU. It is a write heavy cluster with occasional deletes or updates.

Do I need to tune the JVM settings of Cassandra to utilize more memory? What all things should I be looking at to make appropriate settings?

Without answered 13/5, 2015 at 7:0 Comment(0)
A
11

Do I need to tune the JVM settings of Cassandra to utilize more memory?

The DataStax Tuning Java Resources doc actually has some pretty sound advice on this:

Many users new to Cassandra are tempted to turn up Java heap size too high, which consumes the majority of the underlying system's RAM. In most cases, increasing the Java heap size is actually detrimental for these reasons:

  • In most cases, the capability of Java to gracefully handle garbage collection above 8GB quickly diminishes.
  • Modern operating systems maintain the OS page cache for frequently accessed data and are very good at keeping this data in memory, but can be prevented from doing its job by an elevated Java heap size.

If you have more than 2GB of system memory, which is typical, keep the size of the Java heap relatively small to allow more memory for the page cache.

As you have 100GB of RAM on your machines, (if you are indeed running under the "default JVM settings") your JVM max heap size should be capped at 8192M. And actually, I wouldn't deviate from that that unless you are experiencing issues with garbage collection.

JVM resources for Cassandra can be set in the cassandra-env.sh file. If you are curious, look at the code for cassandra-env.sh and look for the calculate_heap_sizes() method. That should give you some insight as to how Cassandra computes your default JVM settings.

What all things should I be looking at to make appropriate settings?

If you are running OpsCenter (and you should be), add a graph for "Heap Used" and "Non Heap Used."

OpsCenter graphing Heap Used and Non Heap Used together

This will allow you to easily monitor JVM heap usage for your cluster. Another thing that helped me, was to write a bash script in which I basically hijacked the JVM calculations from cassandra-env.sh. That way I can run it on a new machine, and know right away what my MAX_HEAP_SIZE and HEAP_NEWSIZE are going to be:

#!/bin/bash
clear
echo "This is how Cassandra will determine its default Heap and GC Generation sizes."

system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
half_system_memory_in_mb=`expr $system_memory_in_mb / 2`
quarter_system_memory_in_mb=`expr $half_system_memory_in_mb / 2`

echo "   memory = $system_memory_in_mb"
echo "     half = $half_system_memory_in_mb"
echo "  quarter = $quarter_system_memory_in_mb"

echo "cpu cores = "`egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo`

#cassandra-env logic duped here
#this should help you to see how much memory is being allocated
#to the JVM
    if [ "$half_system_memory_in_mb" -gt "1024" ]
    then
        half_system_memory_in_mb="1024"
    fi
    if [ "$quarter_system_memory_in_mb" -gt "8192" ]
    then
        quarter_system_memory_in_mb="8192"
    fi
    if [ "$half_system_memory_in_mb" -gt "$quarter_system_memory_in_mb" ]
    then
        max_heap_size_in_mb="$half_system_memory_in_mb"
    else
        max_heap_size_in_mb="$quarter_system_memory_in_mb"
    fi
    MAX_HEAP_SIZE="${max_heap_size_in_mb}M"

    # Young gen: min(max_sensible_per_modern_cpu_core * num_cores, 1/4 * heap size)
    max_sensible_yg_per_core_in_mb="100"
    max_sensible_yg_in_mb=`expr ($max_sensible_yg_per_core_in_mb * $system_cpu_cores)`

    desired_yg_in_mb=`expr $max_heap_size_in_mb / 4`
    if [ "$desired_yg_in_mb" -gt "$max_sensible_yg_in_mb" ]
    then
        HEAP_NEWSIZE="${max_sensible_yg_in_mb}M"
    else
        HEAP_NEWSIZE="${desired_yg_in_mb}M"
    fi

echo "Max heap size = " $MAX_HEAP_SIZE
echo " New gen size = " $HEAP_NEWSIZE

Update 20160212:

Also, be sure to check-out Amy Tobey's 2.1 Cassandra Tuning Guide. She has some great tips on how to get your cluster running optimally.

Alcibiades answered 13/5, 2015 at 13:13 Comment(0)
K
1

system_cpu_cores is not set properly. Edited the right one to execute.

#!/bin/bash
clear
echo "This is how Cassandra will determine its default Heap and GC Generation sizes."

system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
half_system_memory_in_mb=`expr $system_memory_in_mb / 2`
quarter_system_memory_in_mb=`expr $half_system_memory_in_mb / 2`
system_cpu_cores=`cat /proc/cpuinfo   | grep -i processor | wc -l`
echo "   memory = $system_memory_in_mb"
echo "     half = $half_system_memory_in_mb"
echo "  quarter = $quarter_system_memory_in_mb"

echo "cpu cores = `egrep -c 'processor([[:space:]]+):.*' /proc/cpuinfo`"

#cassandra-env logic duped here
#this should help you to see how much memory is being allocated
#to the JVM
if [ "$half_system_memory_in_mb" -gt "1024" ]
then
    half_system_memory_in_mb="1024"
fi
if [ "$quarter_system_memory_in_mb" -gt "8192" ]
then
    quarter_system_memory_in_mb="8192"
fi
if [ "$half_system_memory_in_mb" -gt "$quarter_system_memory_in_mb" ]
then
    max_heap_size_in_mb="$half_system_memory_in_mb"
else
    max_heap_size_in_mb="$quarter_system_memory_in_mb"
fi
MAX_HEAP_SIZE="${max_heap_size_in_mb}M"

# Young gen: min(max_sensible_per_modern_cpu_core * num_cores, 1/4 * heap size)
max_sensible_yg_per_core_in_mb="100"
max_sensible_yg_in_mb=`expr $max_sensible_yg_per_core_in_mb * $system_cpu_cores`
desired_yg_in_mb=`expr $max_heap_size_in_mb / 4`
if [ "$desired_yg_in_mb" -gt "$max_sensible_yg_in_mb" ]
then
    HEAP_NEWSIZE="${max_sensible_yg_in_mb}M"
else
    HEAP_NEWSIZE="${desired_yg_in_mb}M"
fi

echo "Max heap size = " $MAX_HEAP_SIZE
echo " New gen size = " $HEAP_NEWSIZE
Katabasis answered 25/9, 2016 at 20:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.