I am looking for a suitable process by which to keep revisions, or snapshots of rows (and their relations) in a database.
Take for example an eCommerce platform-
- A customer creates an order. The order is associated with a billing address and a delivery address.
- Said customer then changes the address in their address book in their profile.
- The address should not change for the original order.
I have looked at a few concepts, one being duplicate tables, another being temporal databases, and the other being keeping a revision id and active flag.
Whilst I appreciate no one can really tell me the best/most suitable solution for my application as it's a matter open to opinion etc, I was hoping that someone might be able to demonstrate the advantages/disadvantages possibly by way of comparison. I have read lots of questions on SO, and a fair few articles on the various implementations, but none really compare each idea or indicate where they'd be best suited. Below I have outlined my understanding of each of the concepts.
Duplicate tables
Store the information in rows relevant to the data with which they need to be snapshot with. I.e. Keep an address in columns in the orders table of an online store.
Advantages
- Data is segmented into clearly relevant tables, no requirement for joins etc.
- No need to select only active rows as required in the concepts below.
- Assuming rows are timestamped, most of the benefits of temporal database retained
Disadvantages
- Duplication
- of schema (particularly problematic when multiple tables up a revision)
- of models when using an ORM.
- of data if a snapshot piece data hasn't changed and it's reused. I.e. if 10 orders are made, the address is stored 11 times (orders+current)
- Extra code required to handle the inserts into relevant tables.
Temporal Databases/Active or Current row flag
Database rows which are "time-aware", i.e. their context is the time between two datetimes. Data can be joined where it's time context lies between that of the temporal table.
Advantages
- No duplication of schema or models. Changes made in one place.
- ORM model can handle creation of new row, marking as active etc seamlessly.
- No replication of rows where no changes have been made. I.e. 10 orders to 1 address store the address once.
Disadvantages
- Queries made more complex as joins/where clauses require selection of "active" row.
- Tables become clogged up with historic data which is not selected/called upon regularly.
Storing just the changed column, temporal.
Have a table which keeps track of changes to all tables and note the row it relates to and when it is valid in terms of time.
Advantages
- Optimised storage in terms of revisions as unchanged data not replicated.
Disadvantages
- Queries far more complex to combine version of column with it's other data.
I have already looked at the following questions here on SO, and these other resources
Edit: The reason I haven't tagged this post with a particular DBMS as I'd like the concept to work with as many as possible ideally as the platform, at present is DBMS independent and the abstraction layer allows it to work with MySQL and MSSQL but will hopefully support others in the future.
- Database Design for Revisions?
- Relational Schema for Fowler's Temporal Expressions
- Database design for text revisions
- Storing Revisions of Relational Objects in an Efficient Way
- Keeping page changes history. A bit like SO does for revisions
- maintain history in a database
- http://en.wikipedia.org/wiki/Temporal_database
- http://www.simple-talk.com/sql/database-administration/database-design-a-point-in-time-architecture/