How to implement temporal data in MySQL
Asked Answered
O

2

8

I currently have a non-temporal MySQL DB and need to change it to a temporal MySQL DB. In other words, I need to be able to retain a history of changes that have been made to a record over time for reporting purposes.

My first thought for implementing this was to simply do inserts into the tables instead of updates, and when I need to select the data, simply doing a GROUP BY on some column and ordering by the timestamp DESC.

However, after thinking about things a bit, I realized that that will really mess things up because the primary key for each insert (which would really just be simulating a number of updates on a single record) will be different and thus mess up any linkage that uses the primary key to link to other records in the DB.

As such, my next thought was to continue updating the main tables in the DB, but also create a new insert into an "audit table" that is simply a copy of the full record after the update, and then when I needed to report on temporal data, I could use the audit table for querying purposes.

Can someone please give me some guidance or links on how to properly do this?
Thank you.

Omniscience answered 6/7, 2015 at 18:18 Comment(6)
A typical solution is to create a new audit table and have it written to automatically using a trigger. Methods include duplicating the complete row state, or just tracking which field was changed.Tocsin
Dr. Richard Snodgrass has a fantasic book on this topic, available here. cs.arizona.edu/~rts/publications.html Maybe more information than you hoped for, but still A-list material.Eugenieeugenio
Thank you both so much. That's what I was looking for. Also, Ollie, on the page you linked to there are several books listed. Do you know which one in particular I should look at? Thanks.Omniscience
Developing Time-Oriented Database Applications in SQL. I think that's the oneAegrotat
A user mentions this approach may be hard to implement in MySQL at #8151564. Any thoughts on this? I am about to head down this path after using Oracle for this approach for 15 years.Disoblige
After debating how to implement this in MySQL with a co-worker for quite a while, we ended up creating temporal data for only part of the DB. We did this by doing an update and an insert on the relevant tables every time we needed to capture temporal data. Specifically, we updated the standard table and had a second logging table that we inserted into after the insert. Most of the time, we just reference the standard table, which is much quicker, but sometimes, we have to hit up the log table, which we do when necessary. Probably not the best implementation, but it works for our use case.Omniscience
M
12

Make the given table R temporal(ie, to maintain the history).

One design is to leave the table R as it is and create a new table R_Hist with valid_start_time and valid_end_time. Valid time is the time when the fact is true.

The CRUD operations can be given as:

INSERT

  • Insert into both R
  • Insert into R_Hist with valid_end_time as infinity

UPDATE

  • Update in R
  • Insert into R_Hist with valid_end_time as infinity
  • Update valid_end_time with the current time for the “latest” tuple

DELETE

  • Delete from R
  • Update valid_end_time with the current time for the “latest” tuple

SELECT

  • Select from R for ‘snapshot’ queries (implicitly ‘latest’ timestamp)
  • Select from R_Hist for temporal operations

Instead, you can choose to design new table for every attribute of table R. By this particular design you can capture attribute level temporal data as opposed to entity level in the previous design. The CRUD operations are almost similar.

Moshemoshell answered 28/4, 2017 at 7:4 Comment(1)
This is the basic conclusion I came to as well. Having two tables is the best approach, I agree. Thanks.Omniscience
I
1

I did a column Deleted and a column DeletedDate. Deleted defaults to false and deleted date null.

Complex primary key on IDColumn, Deleted, and DeletedDate.

Can index by deleted so you have real fast queries.

No duplicate primary key on your IDColumn because your primary key includes deleted and deleted date.

Assumption: you won't write to the same record more than once a millisecond. Could cause duplicate primary key issue if deleted date is not unique.

So then I do a transaction type deal for updates: select row, take results, update specific values, then insert. Really its an update to deleted true deleted date to now() then you have it spit out the row after update and use that to get primary key and/or any values not available to whatever API you built.

Not as good as a temporal table and takes some discipline but it builds history into 1 table that is easy to report on.

I may start updating the deleted date column and change it to added/Deleted in addition to the added date so I can sort records by 1 column, the added/deleted column while always updated the addedBy column and just set the same value as the added/Deleted column for logging sake.

Either way could just do a complex case when not null as addedDate else addedDate as addedDate order by AddedDate desc. so, yeah, whatever, this works.

Incursion answered 5/2, 2019 at 16:41 Comment(1)
Not a bad idea. I've started using deleted_on and deleted_by fields in almost all of my DB tables as well. Works really well. Thanks.Omniscience

© 2022 - 2024 — McMap. All rights reserved.