How to multi insert rows in cassandra
Asked Answered
S

6

21

What is the most efficient way of inserting multiple rows in cassandra column family. Is it possible to do this in a single call.

Right now my approach is to addinsert multiple column and then execute. There in a single call I am persisting one row. I am looking for strategy so that I can do a batch insert.

Signboard answered 26/7, 2013 at 15:22 Comment(2)
Which driver are you using? Are you using CQL? Which Cassandra version?Ammo
Cassandra version is1.2.x and I am using hector apis.Signboard
H
36

CQL contains a BEGIN BATCH...APPLY BATCH statement that allows you to group multiple inserts so that a developer can create and execute a series of requests (see http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0).

The following worked for me (Scala):

PreparedStatement ps = session.prepare(
"BEGIN BATCH" +    
"INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?);" +    
"INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?);" +    
"INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?);" +    
"APPLY BATCH" ); 

session.execute(ps.bind(uid, mid1, title1, body1, uid, mid2, title2, body2, uid, mid3, title3, body3));

If you don't know in advance which statements you want to execute, you can use the following syntax (Scala):

var statement: PreparedStatement = session.prepare("INSERT INTO people (name,age) VALUES (?,?)")
var boundStatement = new BoundStatement(statement)
val batchStmt = new BatchStatement()
batchStmt.add(boundStatement.bind("User A", "10"))
batchStmt.add(boundStatement.bind("User B", "12"))
session.execute(batchStmt)

Note: BatchStatement can only hold up to 65536 statements. I learned that the hard way. :-)

Hazelton answered 15/7, 2015 at 1:26 Comment(8)
@user853509 A "great answer" that helped you, and not even an upvote? Harsh.Trout
Good answer. Plus one from me.Trout
how much insert statements can i send at one time?Gretchengrete
In the Scala example, all vars should be vals. Object references don't change.Sharpedged
The latter version did not work for me. Had to replace each boundStatement.bind(...) with new BoundStatement(statement).bind(...)Tactual
I tested both approaches. First one is about 4 times faster on INSERTS for me despite the fact that I binded 1500 arguments.Petronius
The current version of the java driver does not work the way you describe here. batchStmt.add returns a new instance. If you keep adding to the same batchStmt, you will have an empty batch at the end of it. This is the crummy "Fluent" programming model which I think is horrible.Cordiecordier
Note: batch inserts are for atomic-insertions and not for performance. This could likely degrade performance compared with sending queries separately, especially when the modifications span multiple partitions: docs.datastax.com/en/cql-oss/3.x/cql/cql_using/useBatch.htmlAubin
A
6

PreparedStatement and binding values might be a better option. Below are a couple of good articles on uses and misuses of Batch:

Cassandra: Batch loading without the Batch keyword.

Using and misusing batches

Allot answered 23/9, 2015 at 16:31 Comment(0)
A
4

There is a batch insert operation in Cassandra. You can batch together inserts, even in different column families, to make insertion more efficient.

In Hector, you can use HFactory.createMutator then use the add methods on the returned Mutator to add operations to your batch. When ready, call execute().

If you're using CQL, then you group things into a batch by starting the batch with BEGIN BATCH and ending with APPLY BATCH.

Ammo answered 27/7, 2013 at 8:10 Comment(2)
Make sure you understand when this is a good idea and when it isn't. Batches don't typically improve performance and can even reduce it in the worst case. If you don't need atomic writes, make sure to use an unlogged batch.Club
Good point. I think in thrift batches nearly always helped (because of the massive overhead of thrift). In CQL, you should use async inserts and prepared statements rather than batches. Except for the case when your inserts are all to the same partition, in which case batches are more efficient.Ammo
K
1

you can add your multiple insert statements into a file and execute the file with 'cqlsh -f'.

You can also perform Batch insert with CQL into cassandra as described in below link: http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/cql_reference/batch_r.html

Kaufmann answered 26/7, 2013 at 20:8 Comment(1)
Can you please fix Datastax URL above? It returns 404 at the moment.Balkh
O
1

When trying to insert multiple rows. Database connection RTT could be the performance bottle neck. In that case, we generally need a way to avoid waiting for one INSERT to finish so that we can begin our next INSERT.Currently there are two ways as far as I know:

  • If data consistency matters, use LOGGED BATCH, but as this question said, BATCH may not have a performance boost in all the situation.
  • Otherwise, use a async api in the Cassandra client library, for example in python there is a execute_async method

Also, you can prepare the SQL statement before execute it. I haven't test the overall performance of a prepared statement vs plain insert. But I think if there are thousands INSERT or more you should get a performance boost.

Outroar answered 23/12, 2019 at 13:55 Comment(0)
P
0

Binding User Defined object to PreparedStatement and execute multiple insert query using BatchStatement in Cassandra

syntax Java

public class UserDefined { 
       private Integer id;
       private String name;
       getters and setters... 
}

UserDefined class of whose objects you want to insert in one go or in one batch

  PreparedStatement preparedStatement = session.prepare("INSERT INTO user_defined_table (id, name) VALUES (?, ?,)");

session -> holds connections to a Cassandra cluster.

  public void insertBatch(List<UserDefined > userDefinedList) {
  var q = new BatchStatement();
  userDefinedList.stream().forEach(userDefinedObject-> {
      var boundStatement = new BoundStatement(preparedStatement );
      q.add(boundStatement.bind(
                userDefinedObject.getAccountId(),
                userDefinedObject.getName()
    }); 
  session.execute(q)
   }

Method to bind list of objects and add batch insert queries and execute

Protest answered 11/7 at 14:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.