batch size of prepared statement in spring data cassandra
Asked Answered
M

2

7

I'm getting this warning in the log:

WARN [Native-Transport-Requests:17058] 2014-07-29 13:58:33,776 BatchStatement.java (line 223) Batch of prepared statements for [keyspace.tablex] is of size 10924, exceeding specified threshold of 5120 by 5804.

Is there a way in spring data cassandra to specify the size?

Cassandra 2.0.9 and spring data cassandra 1.0.0-RELEASE

Muckraker answered 29/7, 2014 at 14:53 Comment(3)
Is that log statement on the client or the server?Mo
It's on the server in server.logMuckraker
I am using datastax python driver and I am getting the exact same error in server logs. Wonder what it isTundra
B
12

This is just a warning, informing you that the query size exceeds certain limit.

The query is still being processed. The reasoning behind is that bigger batched queries are expensive and may cause cluster imbalance. Therefore warning you (the developer) beforehand.

Look for batch_size_warn_threshold_in_kb in cassandra.yaml to adjust when should this warning be produced.

Here is the ticket where it was introduced: https://issues.apache.org/jira/browse/CASSANDRA-6487

Backsaw answered 27/8, 2014 at 9:23 Comment(0)
M
1

I have done extensive performance testing and tuning on Cassandra, working closely withe DataStax Support.

That is why I created the ingest() methods in SDC*, which are super fast in 1.0.4.RELEASE and higher.

This method caches the PreparedStatement for you, and then loops over the individual Bind values and calls executeAsync for each insert. This sounds counter intuitive, but is the fastest (and most balanced) way to insert into Cassandra.

Mo answered 28/8, 2014 at 14:28 Comment(3)
But does this mean we're ok with the warning, or can we make adjustments to the code to stay below the threshold? I don't think it's wise to just start increasing the threshold in the cassandra.yaml just to get rid of the message.Muckraker
What's the best way to change the batch size? Just limit the number of items in the array to the call to template.insertAsynchronously()?Muckraker
What's the appropriate equivalent method to use on a query? I'm seeing a substantial number of batch size warnings and slow performance using CassandraOperations.stream() w/ Cassandra 3.0.7 and SDC* 1.5.0.M1Rapids

© 2022 - 2024 — McMap. All rights reserved.