batch size of prepared statement in spring data cassandra

Asked 29/7, 2014 at 14:53 Answered 28/8, 2014 at 14:28

cassandra cassandra-2.0 spring-data-cassandra

I'm getting this warning in the log:

WARN [Native-Transport-Requests:17058] 2014-07-29 13:58:33,776 BatchStatement.java (line 223) Batch of prepared statements for [keyspace.tablex] is of size 10924, exceeding specified threshold of 5120 by 5804.

Is there a way in spring data cassandra to specify the size?

Cassandra 2.0.9 and spring data cassandra 1.0.0-RELEASE

Muckraker answered 29/7, 2014 at 14:53 Comment(3)

Is that log statement on the client or the server? – Mo 29/7, 2014 at 15:47

It's on the server in server.log – Muckraker 29/7, 2014 at 16:36

I am using datastax python driver and I am getting the exact same error in server logs. Wonder what it is – Tundra 21/8, 2014 at 0:57

This is just a warning, informing you that the query size exceeds certain limit.

The query is still being processed. The reasoning behind is that bigger batched queries are expensive and may cause cluster imbalance. Therefore warning you (the developer) beforehand.

Look for batch_size_warn_threshold_in_kb in cassandra.yaml to adjust when should this warning be produced.

Here is the ticket where it was introduced: https://issues.apache.org/jira/browse/CASSANDRA-6487

Backsaw answered 27/8, 2014 at 9:23 Comment(0)

I have done extensive performance testing and tuning on Cassandra, working closely withe DataStax Support.

That is why I created the ingest() methods in SDC*, which are super fast in 1.0.4.RELEASE and higher.

This method caches the PreparedStatement for you, and then loops over the individual Bind values and calls executeAsync for each insert. This sounds counter intuitive, but is the fastest (and most balanced) way to insert into Cassandra.

Mo answered 28/8, 2014 at 14:28 Comment(3)

But does this mean we're ok with the warning, or can we make adjustments to the code to stay below the threshold? I don't think it's wise to just start increasing the threshold in the cassandra.yaml just to get rid of the message. – Muckraker 30/9, 2014 at 14:26

What's the best way to change the batch size? Just limit the number of items in the array to the call to template.insertAsynchronously()? – Muckraker 30/9, 2014 at 14:55

What's the appropriate equivalent method to use on a query? I'm seeing a substantial number of batch size warnings and slow performance using CassandraOperations.stream() w/ Cassandra 3.0.7 and SDC* 1.5.0.M1 – Rapids 1/10, 2016 at 15:6

Recommended topics

Hot tags