Difference between UPDATE and INSERT in Cassandra?
Asked Answered
M

4

50

What is the difference between UPDATE and INSERT when executing CQL against Cassandra?

It looks like there used to be no difference, but now the documentation says that INSERT does not support counters while UPDATE does.

Is there a "preferred" method to use? Or are there cases where one should be used over the other?

Thanks so much!

Mulcahy answered 13/5, 2013 at 22:32 Comment(3)
Is there any performance difference between INSERT vs UPDATE?Genarogendarme
@Genarogendarme I also want to know this. Do you know any knowledge about this?Sessoms
Sorry @Sessoms I didn't find any more info on this.Genarogendarme
H
22

Counter Columns in Cassandra couldn't be set to an arbitrary value: they can only be incremented or decremented by any arbitrary value.

For this reason, INSERT doesn't support Counter Column because you cannot "insert" a value into a Counter Column. You can only UPDATE them (increment or decrement) by some value. Here's how you would update a Counter column.

    UPDATE ... SET name1 = name1 + <value> 

You asked:

Is there a "preferred" method to use? Or are there cases where one should be used over the other?

Yes. If you are inserting values to the database, you can use INSERT. If the column doesn't exists, it will be created for you. Otherwise, INSERT's effect is similar to UPDATE. INSERT is useful when you don't have a pre-designed schema (Dynamic Column Family, i.e. insert anything, anytime). If you are designing the schema before hand (Static Column Family, similar to RDMS) and know each column, then you can use UPDATE.

Hewe answered 14/5, 2013 at 0:45 Comment(2)
Thanks so much, this really clarifies things!Mulcahy
doesn't it say that they are the same? i.e. "Unlike SQL, the semantics of INSERT and UPDATE are identical." datastax.com/docs/1.1/references/cql/INSERTGarlic
C
60

There is a subtle difference. Inserted records via INSERT remain if you set all non-key fields to null. Records inserted via UPDATE go away if you set all non-key fields to null.

Try this:

CREATE TABLE T (
  pk int,
  f1 int,
  PRIMARY KEY (pk)
);

INSERT INTO T (pk, f1) VALUES (1, 1);
UPDATE T SET f1=2 where pk=2;
SELECT * FROM T;

Returns:

 pk | f1
----+----
  1 |  1
  2 |  2

Now, update each row setting f1 to null.

UPDATE T SET f1 = null WHERE pk = 1;
UPDATE T SET f1 = null WHERE pk = 2;
SELECT * FROM T;

Note that row 1 remains, while row 2 is removed.

 pk | f1
----+------
  1 | null

If you look at these using Cassandra-cli, you will see a different in how the rows are added.

I'd sure like to know whether this is by design or a bug and see this behavior documented.

Calamanco answered 16/5, 2014 at 23:4 Comment(2)
Good catch! Have you gained more insight on it?Crisp
There was another issue i faced related insert and update difference. That is if use update to reduce the ttl of the columns then row with only primary key will be present with all non key columns showing null. Row will be there until original ttl expires.Biologist
H
22

Counter Columns in Cassandra couldn't be set to an arbitrary value: they can only be incremented or decremented by any arbitrary value.

For this reason, INSERT doesn't support Counter Column because you cannot "insert" a value into a Counter Column. You can only UPDATE them (increment or decrement) by some value. Here's how you would update a Counter column.

    UPDATE ... SET name1 = name1 + <value> 

You asked:

Is there a "preferred" method to use? Or are there cases where one should be used over the other?

Yes. If you are inserting values to the database, you can use INSERT. If the column doesn't exists, it will be created for you. Otherwise, INSERT's effect is similar to UPDATE. INSERT is useful when you don't have a pre-designed schema (Dynamic Column Family, i.e. insert anything, anytime). If you are designing the schema before hand (Static Column Family, similar to RDMS) and know each column, then you can use UPDATE.

Hewe answered 14/5, 2013 at 0:45 Comment(2)
Thanks so much, this really clarifies things!Mulcahy
doesn't it say that they are the same? i.e. "Unlike SQL, the semantics of INSERT and UPDATE are identical." datastax.com/docs/1.1/references/cql/INSERTGarlic
P
3

Another subtle difference (i'm starting to believe cql is a terrible interface to cassandra, full of subtleties and caveats due to using similar SQL syntax but slightly different semantics) is with setting TTLs on existing data. With UPDATE you cannot update the TTL of the keys, even if the new actual values are equal to the old values. The solution is to INSERT the new row instead, with the new TTL already set

Peake answered 17/8, 2017 at 14:18 Comment(1)
Did this change? docs.datastax.com/en/cql/3.3/cql/cql_using/… seems to indicate otherwise.Proper
U
0

Regarding the subtle difference highlighted by billbaird (I'm unable to comment on that post directly) where a row created by an update operation will be deleted if all non-key fields are null:

That is expected behavior and not a bug based on the bug report at https://issues.apache.org/jira/browse/CASSANDRA-11805 (which was closed as "Not A Problem")

I ran into this myself when using Spring Data for the first time. I was using the save(T entity) method of a repository, but no row was being created. it turned out Spring Data was using an UPDATE because it determined that the object wasn't 'new' (not sure that test for 'isNew' makes sense here), and I happened to be testing with entities that only had the key fields set.

For this Spring Data case, the Cassandra-specific repository interfaces do provide an insert method that appear to consistently use an INSERT if that behavior is desired instead (though Spring's documentation doesn't document these details sufficiently either).

Unvoiced answered 28/5, 2017 at 9:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.