Database row snapshots/revisions

F

2

9

I am looking for a suitable process by which to keep revisions, or snapshots of rows (and their relations) in a database.

Take for example an eCommerce platform-

A customer creates an order. The order is associated with a billing address and a delivery address.
Said customer then changes the address in their address book in their profile.
The address should not change for the original order.

I have looked at a few concepts, one being duplicate tables, another being temporal databases, and the other being keeping a revision id and active flag.

Whilst I appreciate no one can really tell me the best/most suitable solution for my application as it's a matter open to opinion etc, I was hoping that someone might be able to demonstrate the advantages/disadvantages possibly by way of comparison. I have read lots of questions on SO, and a fair few articles on the various implementations, but none really compare each idea or indicate where they'd be best suited. Below I have outlined my understanding of each of the concepts.

Duplicate tables

Store the information in rows relevant to the data with which they need to be snapshot with. I.e. Keep an address in columns in the orders table of an online store.

Advantages

Data is segmented into clearly relevant tables, no requirement for joins etc.
No need to select only active rows as required in the concepts below.
Assuming rows are timestamped, most of the benefits of temporal database retained

Disadvantages

Duplication
- of schema (particularly problematic when multiple tables up a revision)
- of models when using an ORM.
- of data if a snapshot piece data hasn't changed and it's reused. I.e. if 10 orders are made, the address is stored 11 times (orders+current)
Extra code required to handle the inserts into relevant tables.

Temporal Databases/Active or Current row flag

Database rows which are "time-aware", i.e. their context is the time between two datetimes. Data can be joined where it's time context lies between that of the temporal table.

Advantages

No duplication of schema or models. Changes made in one place.
ORM model can handle creation of new row, marking as active etc seamlessly.
No replication of rows where no changes have been made. I.e. 10 orders to 1 address store the address once.

Disadvantages

Queries made more complex as joins/where clauses require selection of "active" row.
Tables become clogged up with historic data which is not selected/called upon regularly.

Storing just the changed column, temporal.

Have a table which keeps track of changes to all tables and note the row it relates to and when it is valid in terms of time.

Advantages

Optimised storage in terms of revisions as unchanged data not replicated.

Disadvantages

Queries far more complex to combine version of column with it's other data.

I have already looked at the following questions here on SO, and these other resources

Edit: The reason I haven't tagged this post with a particular DBMS as I'd like the concept to work with as many as possible ideally as the platform, at present is DBMS independent and the abstraction layer allows it to work with MySQL and MSSQL but will hopefully support others in the future.

Felipe answered 7/9, 2012 at 15:33 Comment(0)

F

1

I ended up using a temporal database, and the implementation of this resulted in the Temporal Model in FuelPHP.

I can now configure my models to treat rows as a time sensitive entity. Changes cause a new row to be created and the end time of the original row set accordingly.

This allows me to retrieve a row at a point in time.

Felipe answered 22/5, 2013 at 21:8 Comment(0)

F

0

There is another option (on Oracle at least) where you can just set the point in time and run whatever queries you like.

I believe it works using large amounts of flash recovery space but if you're only interested in tracking a few tables this might be overkill.

Frater answered 7/9, 2012 at 15:39 Comment(0)

Duplicate tables

Advantages

Disadvantages

Temporal Databases/Active or Current row flag

Advantages

Disadvantages

Storing just the changed column, temporal.

Advantages

Disadvantages

Recommended topics

Hot tags