Can you share your thoughts how would you implement data versioning in Cassandra.
Suppose that I need to version records in an simple address book. (Address book records are stored as Rows in a ColumnFamily). I expect that the history:
- will be used infrequently
- will be used all at once to present it in a "time machine" fashion
- there won't be more versions than few hundred to a single record.
- history won't expire.
I'm considering the following approach:
Convert the address book to Super Column Family and store multiple version of address book records in one Row keyed (by time stamp) as super columns.
Create new Super Column Family to store old records or changes to the records. Such structure would look as follows:
{ 'address book row key': { 'time stamp1': { 'first name': 'new name', 'modified by': 'user id', },
'time stamp2': { 'first name': 'new name', 'modified by': 'user id', }, },
'another address book row key': { 'time stamp': { ....
Store versions as serialized (JSON) object attached in new ColumnFamilly. Representing sets of version as rows and versions as columns. (modelled after Simple Document Versioning with CouchDB)