Database design for audit logging [closed]
Asked Answered
P

6

175

Every time I need to design a new database I spend quite some time thinking on how I should set up the database schema to keep an audit log of the changes.

Some questions have already been asked here about this, but I don't agree that there is a single best approach for all scenarios:

I have also stumbled upon this interesting article on Maintaining a Log of Database Changes that tries to list the pro and cons of each approach. It's very well written and has interesting information, but it has made my decisions even harder.

My question is: Is there a reference that I can use, maybe a book or something like a decision tree that I can refer to decide which way should I go based on some input variables, like:

  • The maturity of the database schema
  • How the logs will be queried
  • The probability that it will be need to recreate records
  • What's more important: write or read performance
  • Nature of the values that are being logged (string, numbers, blobs)
  • Storage space available

The approaches that I know are:

1. Add columns for created and modified date and user

Table example:

  • id
  • value_1
  • value_2
  • value_3
  • created_date
  • modified_date
  • created_by
  • modified_by

Major cons: We lose the history of the modifications. Can't rollback after commit.

2. Insert only tables

Table example:

  • id
  • value_1
  • value_2
  • value_3
  • from
  • to
  • deleted (Boolean)
  • user

Major cons: How to keep foreign keys up to date? Huge space needed

3. Create a Separate history table for each table

History table example:

  • id
  • value_1
  • value_2
  • value_3
  • value_4
  • user
  • deleted (Boolean)
  • timestamp

Major cons: Needs to duplicate all audited tables. If the schema changes it will be needed to the migrate all the logs too.

4. Create a Consolidated history Table for All Tables

History table example:

  • table_name
  • field
  • user
  • new_value
  • deleted (Boolean)
  • timestamp

Major cons: Will I be able to recreate the records (rollback) if needed easily? The new_value column needs to be a huge string so it can support all different column types.

Parturition answered 6/1, 2010 at 18:27 Comment(4)
related: #9853203Protozoon
and what about using a history database instead of tables?Craniometry
Maybe you could check the design of github.com/airblade/paper_trailSeymourseys
Is it a bad idea to log all(required) queries executed as it is?Rebatement
P
95

One method that is used by a few wiki platforms is to separate the identifying data and the content you're auditing. It adds complexity, but you end up with an audit trail of complete records, not just listings of fields that were edited that you then have to mash up to give the user an idea of what the old record looked like.

So for example, if you had a table called Opportunities to track sales deals, you would actually create two separate tables:

Opportunities
Opportunities_Content (or something like that)

The Opportunities table would have information you'd use to uniquely identify the record and would house the primary key you'd reference for your foreign key relationships. The Opportunities_Content table would hold all the fields your users can change and for which you'd like to keep an audit trail. Each record in the Content table would include its own PK and the modified-by and modified-date data. The Opportunities table would include a reference to the current version as well as information on when the main record was originally created and by whom.

Here's a simple example:

CREATE TABLE dbo.Page(  
    ID int PRIMARY KEY,  
    Name nvarchar(200) NOT NULL,  
    CreatedByName nvarchar(100) NOT NULL, 
    CurrentRevision int NOT NULL, 
    CreatedDateTime datetime NOT NULL

And the contents:

CREATE TABLE dbo.PageContent(
    PageID int NOT NULL,
    Revision int NOT NULL,
    Title nvarchar(200) NOT NULL,
    User nvarchar(100) NOT NULL,
    LastModified datetime NOT NULL,
    Comment nvarchar(300) NULL,
    Content nvarchar(max) NOT NULL,
    Description nvarchar(200) NULL

I would probably make the PK of the contents table a multi-column key from PageID and Revision provided Revision was an identity type. You would use the Revision column as the FK. You then pull the consolidated record by JOINing like this:

SELECT * FROM Page
JOIN PageContent ON CurrentRevision = Revision AND ID = PageID

There might be some errors up there...this is off the top of my head. It should give you an idea of an alternative pattern, though.

Pandorapandour answered 7/1, 2010 at 12:55 Comment(11)
In terms of audit good approach but for production it will take a lot of time developing a separate audit table for each table in database, writing triggers for each table to capture changes and writing it to audit table. Furthermore a huge challenge in developing a single audit report for all tables as each audit table is different in structure.Vasiliu
If writing and maintaining scripts for each table is a concern for an organization that intends to manage an audited database I would naturally recommend that they hire either an experienced DBA or a highly-flexible and very-experienced software-engineer with adequate experience creating audited databases.Corsetti
I realize this is a very old post but nonetheless, maybe somebody could point me to one of the mentioned wiki platforms that use the pattern described? Any references would be great. ThanksCenterpiece
Unfortunately I think the ScrewTurn one is long dead. That was the original basis of the answer.Pandorapandour
Is it correct that PageContent.PageID is FK to Page.ID and Page.CurrentRevision is FK to PageContent.Revision? Is this dependency really circular?Pointtopoint
I have down voted since it does not address the alternatives that are mentioned. It gives another option which is very specific solution to a very specific use case. But I do see the merits of suggested designFrivolity
I can think of very few fields that I could say with confidence won't change so all of the "main" tables for each entity would end up just being id, revision_id; more of a junction table, really. This feels a bit smelly to me. What advantage does this have over the approach 3 in OP (history table per audited table)?Grotesquery
Not sure I follow. The main tables are the contents tables, so I would imagine they'd be more likely to get column bloat rather than be too lean to be valuable. The content table is just like a standard table with no auditing. The version table (in this example the Page table), just lets you know which version of the record is current.Pandorapandour
@Pointtopoint This NOT a circular dependency, because CurrentRevision = Revision is not a FK relationship. CurrentRevision can not be a FK, when Revision is not the PK on the other side. Its a filter for a specific revision on the right side of the join, just like any other condition like a date range or anything else.Protozoon
Why does Page have a CreatedDateTime when PageContent already has a DateTime? Is there any difference between CreatedDateTime and the LastModified of the Content with a revision of 0?Rasia
The page create date just represents when it was initially created. The content would let you infer the latest revision date.Pandorapandour
G
14

If you are using SQL Server 2008, you probably should consider Change Data Capture. This is new for 2008 and could save you a considerable amount of work.

Gallinule answered 6/1, 2010 at 18:47 Comment(4)
here is the link to the SQL 2012 change tracking info. msdn.microsoft.com/en-us/library/bb933994.aspx +1 for using built in functionality, no point re-inventing the wheel.Squamation
@Squamation have you ever used it yourself? Indeed, it tracks everything... but being able to get useful information out of it is a whole other story. Can't use a tractor wheel for my bike.Craniometry
This really would have been awesome. But if you've only got the Standard edition of SQL Server, like me, you're out of luck: "Change data capture is only available in the Enterprise, Developer, and Enterprise Evaluation editions".Loisloise
Important to mention that Change Data Capture does not allow you to pass basic information like WHO made some change. You would have to create a trigger to capture that information. In my opinion, since you have to rely on triggers it makes more sense to have your own auditing triggers, which would work in any SQL Server edition.Oxus
M
7

I don't know of any reference, but I'm sure someone has written something.

However, if the purpose is simply to have a record of what happened—the most typical use of an audit log—then why not simply keep everything:

timestamp
username
ip_address
procedureName (if called from a stored procedure)
database
table
field
accesstype (insert, delete, modify)
oldvalue
newvalue

Presumably this is maintained by a trigger.

Matildamatilde answered 6/1, 2010 at 18:34 Comment(2)
I don't know of any way of getting that within the database server, but of course that could be done from outside it readily enough.Matildamatilde
It seems to me that this is the same design pattern as the 4th option shown in the original question.Eyeshot
B
5

We’ll create a small example database for a blogging application. Two tables are required:

blog: stores a unique post ID, the title, content, and a deleted flag. audit: stores a basic set of historical changes with a record ID, the blog post ID, the change type (NEW, EDIT or DELETE) and the date/time of that change. The following SQL creates the blog and indexes the deleted column:

CREATE TABLE `blog` (
    `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
    `title` text,
    `content` text,
    `deleted` tinyint(1) unsigned NOT NULL DEFAULT '0',
    PRIMARY KEY (`id`),
    KEY `ix_deleted` (`deleted`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COMMENT='Blog posts';

The following SQL creates the audit table. All columns are indexed and a foreign key is defined for audit.blog_id which references blog.id. Therefore, when we physically DELETE a blog entry, it’s full audit history is also removed.

CREATE TABLE `audit` (
    `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
    `blog_id` mediumint(8) unsigned NOT NULL,
    `changetype` enum('NEW','EDIT','DELETE') NOT NULL,
    `changetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    PRIMARY KEY (`id`),
    KEY `ix_blog_id` (`blog_id`),
    KEY `ix_changetype` (`changetype`),
    KEY `ix_changetime` (`changetime`),
    CONSTRAINT `FK_audit_blog_id` FOREIGN KEY (`blog_id`) REFERENCES `blog` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
Barm answered 24/4, 2015 at 5:25 Comment(0)
W
3

I think there is nothing like a decision tree. Since some of the pros and cons (or the requirements) are not really countable. How do you measure Maturity for instance?

So just line up your business requirements for your audit logging. Try to predict how these requirements might change in the future and generate your technical requirements. Now you can compare it to the pros and cons and choose the right/best option.

And be assured, it doesn't matter how you decide, there will always someone who think you made the wrong decision. However, you did your homework and you justify your decision.

Wedgwood answered 6/1, 2010 at 18:47 Comment(0)
A
1

I'm using the following structure:

id  int
user_id int
system_user_id  int
tenant_id   int
db_name varchar
model_name  varchar
model_primary_key   int
model_attributes    text
created_at  timestamp
ip  varchar
session_id  varchar
request_id  varchar
comments    text

Works well so far with ~362 million records, multi-tenant, multi-database.

model_attributes is the most important, what changed, as json string in key value format.

Areopagus answered 13/4, 2021 at 20:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.