Cassandra batch query performance on tables having different partition keys
Asked Answered
A

1

0

I have test case in which I receive 150k requests per second from a client.

My test case requires inserting UNLOGGED batch to multiple tables and having different partition keys

BEGIN UNLOGGED  BATCH
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='Country' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('US')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='City' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Dallas')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='State' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Texas')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='SSN' and ptype='text' and date='2017-03-20' and pvalue=decimalAsBlob(000000000);
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='Gender' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Female')
APPLY BATCH

Is there a better way than the current way that I i'm following?

because currently, I am batch inserting to multiple tables that may be present in the different clusters as they have the different partition key and as of my knowledge inserting batch queries to different tables having different partision key have extra tradeoff.

Alfons answered 21/3, 2017 at 14:27 Comment(0)
G
4

At first, it is important to know the use case of batch.

Batches are often mistakenly used in an attempt to optimize performance.

Batches are used to maintain data consistency among multiple tables. If atomicity is needed, logged batch is used. If in your case, this is a counter table and if counts among tables do not need to be consistent, then do not use batch. If you cluster is okay, Cassandra ensures all writes to be sucessful.

Unlogged batches require the coordinator to manage inserts, which can place a heavy load on the coordinator node. If other nodes own partition keys, the coordinator node needs to deal with a network hop, resulting in inefficient delivery. Use unlogged batches when making updates to the same partition key.

Please follow below articles:

https://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html

https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.npmx2cnsq

Gastrin answered 22/3, 2017 at 8:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.