How to design a SQL db with undo-redo?

Asked 26/3, 2011 at 0:25 Answered 26/3, 2011 at 0:56

Solved mysql sql undo-redo delta audit-tables

I'm trying to figure out how to design my DB tables to allow Undo-Redo.

Pretend you have a tasks table with the following structure:

id <int>
title <varchar>
memo <string>
date_added <datetime>
date_due <datetime>

Now assume that over a few days and multiple log-ins that several edits have taken place; but a user wants to go back to one of the versions.

Would you have a separate table tracking the changes - or - would you try to keep the changes within the tasks table ("ghost" rows, for lack of a better term)?
Would you track all of the columns or just the ones that changed each time?

If it matters, I'm using MySQL. Also, if it matters, I'd like to be able to show the history (ala Photoshop) and allow a user to switch to any version.

Bonus question: Would you save the whole memo cell on a change or would you try to save the delta only? Reason I ask is because the memo cell could be large and only a single word or character might be changed each revision. Granted, saving the delta would require parsing, but if undos aren't expected very often, wouldn't it be better to save space rather than processing time?

Thank you for your help.

Borek answered 26/3, 2011 at 0:25 Comment(1)

Similar question here. Have a look at my answer – Finisterre 26/3, 2011 at 23:21

I would create a History table for your tasks table. Same structure as tasks + a new field named previousId. This would hold the previous change id, so you can go back an forth through different changes (undo/redo).

Why a new History table? For a simple reason: do not overload tasks table with things that it was not designed for.

As for space, in the History, instead of a Memo, use a binary format and zip the content of the text you want to store. Don't try to detect changes. You will run into a buggy code which will result in frustration and wasted time...

Optimization: Even better, you may keep only three columns in History table: 1. taskId (foreign key to tasks) 2. data - a binary field. Before saving in the History table, create an XML string holding only the fields that have changed. 3. previousId (will help maintain a queue of changes and allow navigation back and forth)

As for data field, create an XML string like this:

<task>
  <title>Title was changed</title>
  <date_added>2011-03-26 01:29:22<date_added>
</task>

This will basically tell you that this time you changed only the title and the date_added fields.

After the XML string is built, just zip it if you want and store it into History table's data field.

XML will also allow for flexibility. If you add / remove a field in tasks table, you don't need to update the History table, too. So this way the structure of the tasks table and History table are decoupled so you don't need to update two tables each time.

PS: don't forget to add some indexes to quickly navigate through the history table. Fields to be indexed: taskId and previousId as you will need fast queries against this table.

Hope this helps.

Polly answered 26/3, 2011 at 0:56 Comment(3)

By the way, compression will reduce your text size up to 5% of the original text. A common value is around 10%, but if you have common, repeating words you get even better compression. – Polly 30/3, 2011 at 20:33

this is clever, but I'm not sure I understand the "3 fields" idea. Surely each record in the history table must have its own ID field (autoincrement), a foreign link to the task record's ID, and a reference to the previous history record ID, if applicable (i.e. having the same task record ID)... or is there sthg I haven't understood? – Lemire 18/8, 2014 at 17:49

@mikerodent I think you got the idea, we need 2 keys: one to task table (taskId) and one to History table (previousId) plus the payload (data) that changed so that you may be able to navigate through the history based on taskId and previousId fields and also be able to access the data that changed and recover it if needed. – Polly 4/9, 2014 at 11:8

When I do similar types of things using SQL I always use a second table for revision history. This prevents your primary table from getting overly large with versions. The rationale is that retrieving the record that is current happens almost 100% of the time, viewing history and rolling back (undo) is very infrequent.

If you only have a single UNDO or history, then tracking in-table is probably fine.

Whether you want to save deltas or the entire cell depends on expected growth / usage. If you are comfortable creating the logic to manage deltas, that will save you space. If things don't really create new versions that often I wouldn't start with that, (applying YAGNI)

Pignut answered 26/3, 2011 at 0:36 Comment(0)

You might want to compress revisions in delta form but you should still have the current revision in full for quick retrieval.

However, older to newer deltas require lots of processing unless you have some non-delta to base on. Newer to older deltas require reprocessing every time something changes. So deltas usually do not get you many benefits but greater complexity.

Last I checked, which is some years ago, MediaWiki, the software behind Wikipedia, stored full texts and provided some means to compress older revisions with gzip to save space and a dedicated table archive for deleted revisions / pages.

Their website has an ER diagram of their database layout which you might find useful.

Metric answered 26/3, 2011 at 0:53 Comment(0)

Recommended topics

Hot tags