Overwrite row in cassandra with INSERT, will it cause tombstone?
Asked Answered
L

2

14

Writing data to Cassandra without causing it to create tombstones are vital in our case, due to the amount of data and speed. Currently we have only written a row once, and then never had the need to update the row again, only fetch the data again.

Now there has been a case, where we actually need to write data, and then complete it with more data, that is finished after awhile. It can be made by either;

  1. overwrite all of the data in a row again using INSERT (all data is available), or

  2. performing an Update only on the new data.

What is the best way to do it, bear in mind of the speed and not creating a tombstone is of importance ?

Leotaleotard answered 25/6, 2015 at 14:34 Comment(0)
S
13

Tombstones will only created when deleting data or using TTL values.

Cassandra does align very well to your described use case. Incrementally adding data will work for both INSERT and UPDATE statements. Cassandra will store data in different locations in case of adding data over time for the same partition key. Periodically running compactions will merge data again for a single key to optimize access and free disk space. This will happend based on the timestamp of written values but does not create any new tombstones. You can learn more about how Cassandra stores data e.g. here.

Syndetic answered 25/6, 2015 at 14:51 Comment(2)
nice way to create tombstones is update same partition key around 1000 types which has 2 or 3 collection type column and each time update 1 collection of 100 elementsRobbyrobbyn
Not only delete or TTL causes TS. From datastax documentation: "some operations that generate tombstones: Using a CQL DELETE statement Expiring data with time-to-live (TTL) Using internal operations, such as Using materialized views INSERT or UPDATE operations with a null value UPDATE operations with a collection column"Barthold
M
3

It would be more efficient to do an update to add new or changed data. There is no need to rewrite the old data that isn't changing and it would be inefficient to make Cassandra rewrite it.

When you do an insert or update, Cassandra keeps a timestamp for the modify time for each column. When you do a read, Cassandra collects all the writes for that key from in memory, from on disk, and from other replicas depending on the consistency setting. It will then merge the column data so that the newest value is used for each column.

When data is compacted on disk, if there are separate updates for different columns of a row, those will be combined into a single row in the compacted data.

You don't need to worry about creating tombstones by doing an update unless you are using an update to set a TTL (Time To Live) value. In your application it sounds like you never delete data, so you will never have any tombstones.

Marigolde answered 25/6, 2015 at 17:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.