Low write performance of Cassandra
Asked Answered
C

2

6

I am new to NoSQL and Cassandra. I am experimenting with settings to acheive an in memory cache only solution. I am processing by reading line by line from a 100000 lines file and using Hector to insert to Cassandra. I am noticing a very low throughput of around 6000 inserts per second. The whole write operation about 20.5 seconds which is unacceptable to our application. We need something like 100000 inserts per second. I am testing on a Windows 7 computer with 4GB RAM.

I am doing an insert only test.

Kindly let me know where I am going wrong. Kindly suggest on how I can improve the inserts per second.

Keyspace: Keyspace1
        Read Count: 0
        Read Latency: NaN ms.
        Write Count: 177042
        Write Latency: 0.003106884242157228 ms.
        Pending Tasks: 0
                Column Family: user
                SSTable count: 3
                Space used (live): 17691
                Space used (total): 17691
                Number of Keys (estimate): 384
                Memtable Columns Count: 100000
                Memtable Data Size: 96082090
                Memtable Switch Count: 1
                Read Count: 0
                Read Latency: NaN ms.
                Write Count: 177042
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache capacity: 150000
                Key cache size: 0
                Key cache hit rate: NaN
                Row cache capacity: 150000
                Row cache size: 0
                Row cache hit rate: NaN
                Compacted row minimum size: 73
                Compacted row maximum size: 924
                Compacted row mean size: 784

I have tried couple of methods for setting row cache and key cache:

  1. Through Cassandra CLI

  2. Through NodeCmd: java org.apache.cassandra.tools.NodeCmd -p 7199 setcachecapacity Keyspace1 user 150000 150000

Chalybite answered 6/12, 2011 at 14:9 Comment(4)
What sort of disk storage are you using? Is it an SSD or HDD or memory file system? How much CPU User/System is your application using when this is running? (in Task Manager)Allele
The disk storage is Hard disk. Total CPU is around 40%.Chalybite
When we did some tests a year ago we found Cassandra was slower than PostgreSQL up until Cassie had 4+ servers. So I am not surprised.Enclave
are you using a single database server?Newland
D
9

I wouldn't describe 6000 writes per second as "slow" - but Cassandra can do much better. But note that Cassandra is designed for durable writes, so may give lower performance than memory-only caching solutions.

As sbridges says, you cannot get full performance out of Cassandra using a single client. Try using multiple client threads, or processes, or machines.

I don't think you will get 100,000 writes per second on a single node. I have only obtained around 20,000-25,000 writes per second on modest hardware (although Cassandra has got significantly faster since I did that benchmarking). 6000 per second seems about right for a single client against a single commodity node.

With a cluster of nodes, you can definitely get 100,000 per second (See http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html for a recent benchmark of 1,000,000 writes per second!)

Row cache and key cache are to help read performance, not write performance.

Also, make sure you are batching the writes (if appropriate) - this will reduce the network overhead.

Dasteel answered 6/12, 2011 at 21:54 Comment(1)
Batch insert increase a lot of performance. I passed from 5k insert/seconds to 20-25k insert/seconds. I have 3 nodes. 6 cpu with 32 gb ramSo
P
8

How many threads/processes are you using to perform inserts? Hector calls are synchronous, so if you are only using 1 thread on the client side, that may be your bottleneck.

Phooey answered 6/12, 2011 at 16:34 Comment(1)
I am using only one thread. I will try with multiple threads.Chalybite

© 2022 - 2024 — McMap. All rights reserved.