Versioning in cassandra

If it's not possible to relax the requirement of versions increasing by 1, one option is to use counters.

Create a table for the data:

create table file_details(id text primary key, fname text, mimetype text);

and a separate table for the version:

create table file_details_version(id text primary key, version counter);

This needs to be a separate table because tables can either contain all counters or no counters.

Then for an update you can do:

insert into file_details(id, fname, mimetype) values ('id1', 'fname', 'mime');
update file_details_version set version = version + 1 where id = 'id1';

Then a read from file_details will always return the latest, and you can find the latest version number from file_details_version.

There are numerous problems with this though. You can't do atomic batches with counters, so the two updates are not atomic - some failure scenarios could lead to only the insert into file_details being persisted. Also, there is no read isolation, so if you read during an update you may get inconsistent data between the two tables, Finally, counter updates in Cassandra are not tolerant of failures, so if a failure happens during a counter update you may double count i.e. increment the version too much.

I think all solutions involving counters will hit these issues. You could avoid counters by generating a unique ID (e.g. a large random number) for each update and inserting that into a row in a separate table. The version would then be the number of IDs in the row. Now you can do atomic updates, and the counts would be tolerant to failures. However, the read time would be O(number of updates) and reads would still not be isolated.

Recommended topics

Hot tags