Versioning in cassandra
Asked Answered
M

1

0

I have a requirement of versioning to be done using cassandra.

Following is my column family definition

create table file_details(id text primary key, fname text, version int, mimetype text);

I have a secondary index created on fname column.

Whenever I do an insert for the same 'fname', the version should be incremented. And when I retrieve a row with fname it should return me the latest version row.

Please suggest what approach needs to be taken.

Molest answered 2/9, 2013 at 14:32 Comment(4)
Do you have a requirement for the version to increment by exactly 1 each time? If not, the max of the timestamps for fname and mimetype will be an always increasing number so can be used for versioning.Pedaiah
Yes, I have requirement of increasing the version by exactly 1. Also, can you tell me how will be the query to get the max timestamps for fname and mimetype?Molest
You can use select writetime(fname), writetime(mimetype) from file_details where id = 'id'; and find the max in your code.Pedaiah
Thanks Richard for the quick response, any idea on what needs to be done if I have increment exactly by 1 each time?Molest
P
2

If it's not possible to relax the requirement of versions increasing by 1, one option is to use counters.

Create a table for the data:

create table file_details(id text primary key, fname text, mimetype text);

and a separate table for the version:

create table file_details_version(id text primary key, version counter);

This needs to be a separate table because tables can either contain all counters or no counters.

Then for an update you can do:

insert into file_details(id, fname, mimetype) values ('id1', 'fname', 'mime');
update file_details_version set version = version + 1 where id = 'id1';

Then a read from file_details will always return the latest, and you can find the latest version number from file_details_version.

There are numerous problems with this though. You can't do atomic batches with counters, so the two updates are not atomic - some failure scenarios could lead to only the insert into file_details being persisted. Also, there is no read isolation, so if you read during an update you may get inconsistent data between the two tables, Finally, counter updates in Cassandra are not tolerant of failures, so if a failure happens during a counter update you may double count i.e. increment the version too much.

I think all solutions involving counters will hit these issues. You could avoid counters by generating a unique ID (e.g. a large random number) for each update and inserting that into a row in a separate table. The version would then be the number of IDs in the row. Now you can do atomic updates, and the counts would be tolerant to failures. However, the read time would be O(number of updates) and reads would still not be isolated.

Pedaiah answered 2/9, 2013 at 15:32 Comment(1)
The two most important use cases of versioning includes i) Version control(revert to any previous version) ii) consume what's changed. Your method is not able to perform any of those.Clue

© 2022 - 2024 — McMap. All rights reserved.