Maintaining Referential Integrity - Good or Bad?
Asked Answered
F

9

37

We are planning on introducing simple Audit Trail in our database using triggers and separate history table for each table that requires auditing.

For example consider table StudentScore, it has few foreign keys (eg. StudentID, CourseID) linking it to corresponding parent tables (Student & Course).

Table StudentScore (
    StudentScoreID, -- PK
    StudentID ref Student(StudentID),  -- FK to Student
    CourseID ref Course(CourseID),   -- FK to Course
)

If StudentScore requires auditing, we are planning to create audit table StudentScoreHistory -

Table StudentScoreHistory (
    StudentScoreHistoryID, -- PK
    StudentScoreID,
    StudentID,
    CourseID,
    AuditActionCode,
    AuditDateTime,
    AuditActionUserID
)

If any row in StudentScore is modified we'll move old row to StudentScoreHistory.

One of the points raised during design discussion was to make StudentID and CourseID in StudentHistory table a FK, to maintain referential integrity. Argument made in favour of this was as we always mostly do a soft (logical Boolean flag) delete rather than hard delete, its good to maintain referential integrity to ensure we do not have any orphan ids in audit table.

Table StudentScoreHistory (
    StudentScoreHistoryID, -- PK
    StudentScoreID,
    StudentID ref Student(StudentID), -- FK to Student
    CourseID ref Course(CourseID), -- FK to Course
    AuditActionCode,
    AuditDateTime,
    AuditActionUserID
)

This seems to be a bit odd design to me. I do agree with @Jonathan Leffler's comment that audit record should not stop the deletion of parent data. Instead, if this is required, should be handled via foreign keys in main table and not in audit table. I want to get your opinion, to make sure I'm not missing some value in extending foreign keys to audit tables.

Now my question is: Is it a good design to have these foreign keys in History tables?

Any details on key arguments (e.x. performance, best practice, design flexibility etc) would be highly appreciated.

For benefit of anyone looking for specific purpose and our environment:

Purpose:

  1. Maintain critical data history
  2. Allow auditing of user activity with support to recreate scenario
  3. To limited extent allow roll-back of user activity

Environment:

  • Transactional database
  • Not every table requires auditing
  • Uses soft-delete to the extent possible, specifically for static/reference data
  • Few highly transactional tables do use hard deletes
Feeding answered 26/4, 2011 at 17:54 Comment(0)
T
31

When discussing auditing, I would go back to the purpose behind it. It isn't really a backup but rather a history of what has been. For example, for StudentScore, you would want to be sure not to lose the fact that the student originally had a 65% when they now have a 95%. This audit trail would allow you to walk back through the changes to see what happened and who did it. From this, you could identify what a particular user did to abuse the system. In some ways this could be a type of backup since you could roll back these changes to their previous states without rolling back entire tables.

With this in mind (if my assumptions about what you are using this for are correct), the only place you would want a FK/PK relationship is between the history table and its "live" counterpart. Your audit (history) table should not refer to any other table because it is no longer a part of that system. Instead, it is simply a record of what has happened in one table. Period. The only referential integrity you might want to consider is between the history table and the live table (thus the possible FK/PK relationship). If you allow records to be deleted from the live table, don't include the FK in the history table. Then the history table could include deleted records (which is what you want if you allow deletions).

Don't get confused with relational integrity in the main database with this history table. The history tables are all stand-alone. They only serve as a history of one table (not a set of tables).

The relating of two history tables together is possible and even more advanced relations between the live and history tables together (Students and Courses with both live and history, for example) so you can deal with even the possibility that a student was deleted (shudder) since the record would still be in the history table. The only issue here would be if you don't keep the history for a particular table, in which case you are choosing to lose that data (if you allow deletes).

Tattler answered 14/5, 2011 at 4:38 Comment(2)
Yes - this question ignores the distinction between relational data and a log file. No point applying RDB structures and requirements - but a log file has its own structures and requirements.Bavardage
Agree with your points, other than 'relating of two history tables is possible' - couldn't see how can we implement it. There might be multiple CourseID:15 in CourseHistory table - what would we be maintaining the FK agains. Also, any suggestions on what could be potential drawbacks (performance, etc) in case we do maintain the FK between StudentHistory & Course table. Thanks for your time!Feeding
G
6

I would suggest not extending foreign keys to audit tables. My recommendation is that the data in an audit be expanded to what the Foreign Key values are.

Instead of storing CourseID as "1", it will be "HTML4". That way if the Foreign Key value is removed, the audit table is still valid. This will also hold true if the Foreign Key value is changed from "HTML4" to "HTML5" anytime in the future. If you only stored the Foreign Key, then you would be telling the auditor that the previous students did "HTML5", which is not correct.

Another great benefit is the ability to ship off the audit trails to another server for data mining without any issues.

I have used the above setup for a while now and it works for me.

Gupta answered 16/5, 2011 at 12:0 Comment(5)
Course is more than just Name, it might have additional relevant fields, including FK (InstructorID) to other table. Not sure if expanding all the data would be feasible. I'll edit my question to include this.Feeding
@Feeding If you need to be able to go back to the reference table for other information, then you will have to store the Foreign Keys in the audit table. To make this work, you will have to implement soft deletes and ensure that reference data is not changed to invalidate the audit table i.e. Create a new record for the "HTML5" Course and not edit the "HTML4" one.Gupta
I would be storing the 'CourseID' in audit table, not sure if I understand why would I have to maintain the FK. I can always go back and look into the related tables audit in case I ever need to backtrack an action.Feeding
@Feeding If want to maintain your referential integrity, then I would suggest adding the Foreign Key constraint to the audit table. You could end up doing a lookup on the related table and the data is no longer there.Gupta
I like the idea of auditing all tables, so that we can always refer to the whole record that an ID points to, not only to its name or description. I probably store much more data than I'll ever need, but better safe than sorry, and afterall disk space is cheap. I have proposed a trigger-based auditing model here: codeproject.com/Articles/1112660/…Clachan
S
6

If you need to re-create the scenario, then I'd say yes you need the FKs, and having them I think would be an easier way to track through to the relevant related detail records. However, this makes deletions an issue, as well as info that may change in your primary key tables. In this case I'd say you don't want to delete records that have FKs in other tables, but rather use a soft delete as you've already indicated.

As far as info in the PK tables changing, caveat emptor. Setting up FKs would be a simple way to get some traceback ability, but it won't be perfect. There are trade-offs. To get an absolutely perfect history, you basically would need to create backup copies of all related records, any time an audit candidate record has something happen on it. You need to figure out the level of granularity that's appropriate and go with it, because a perfect record of events could be complicated to set up, and eat up a lot of space in the process.

Also, this may or may not be an option for you, but I would strongly consider a combination of tools like ApexSQL Audit + ApexSQL Log as opposed to a homegrown auditing solution. Based on your needs, those two tools combined with archiving your transaction logs periodically would cover what you need to do. The audit tool can store data in the same db or elsewhere, and the log tool can selectively recover data. Just a thought.

Sasha answered 17/5, 2011 at 18:17 Comment(0)
D
4

Your milage will obviously vary with the situation, but in my experience, keep referential integrity with the originating table's primary key and no more. This allows to avoid orphan IDs in the history able, while allowing fluid interactions with the related tables.

Suppose, for instance that you've something like this:

table scores (
 score_id,
 student_id ref students (student_id),
 course_id ref courses (course_id),
 score_date,
 score,
 pkey (score_id)
)

In that case, having an on delete cascade fkey referencing scores(score_id) on score_logs makes sense. It's the object; if it gets hard-deleted, might as well discard the history as well.

The foreign keys on student_id and course_id, by contrast, make less sense in my experience. They mean that you cannot do a (hard) delete on students and courses -- even when no live rows that reference them exist. This might be what you want to achieve, in which case ignore the tip. In my case, I find myself in need pruning users, comments, products, orders and so forth; foreign keys in the history logs make this inconvenient.

Also, note that there is a case where fkeys work against you. If you've an order line on an order, and the order line gets deleted, you still want the history on that order line. The correct pkey to use in this case is the order_id, not the order_line_id.

One last note, in case you end up opting to keep the fkeys: consider what they should be pointing to. With decoupled pieces of data (e.g. students and courses), it's reasonable to assume that the live row is fine. With strongly coupled pieces of data, however, (e.g. products and promos) what you'll really want is to be referencing both the fkey and its version.

Re the two previous points, you might find this related thread and answer interesting:

How do you create an audit trail for aggregate roots?

Denishadenison answered 14/5, 2011 at 14:8 Comment(0)
E
3

If your system is really focused on transaction processing then my answer may not apply well to you, but in the datawarehouse/BI world, this problem is often solved by using a "star schema". In this approach, you would denormalize the important indicative information from the linked tables along with your audit records. This could include the PK values of the parent tables (i.e. the FK values on your audited table). However, you wouldn't preserve the actual referential integrity contstraints themselves.

So for your example, your StudentScoreHistory table could retain its StudentID column, without the FK constraint, as well as maybe the StudentName (or whatever you think you might need from Student). This way you can go back to your audit trail to piece together what has happened and when without worrying about whether you are hard or soft deleting parent records. This has the further advantage (or disadvantage, depending on your perspective) of keeping track of the changable parent table attributes as they were when the child record was originally recorded. For example, it might be useful to know that Student 123456, who is now Mrs. Marriedlady used to be Miss Singlegirl when her biology degree was conferred.

Echeverria answered 27/4, 2011 at 1:4 Comment(1)
Yes - in my case it is transaction database, but interesting approach. I'll explore more about it. On your point of going back to piece together what happened, it should be simple in original example as well till the time we keep audit trail of all critical tables/entities. We can go back to StudentHistory table and look for what Mrs. Marriedlady was called at a given point in time.Feeding
U
3

Your live schema enforces relational integrity so you don't need foreign keys in the History schema. Or put it another way: the only reason to enforce foreign keys between tables in the History schema is if there is some mechanism for executing DML against the History schema other than populating it from changes in the live schema. In which case your History schema is pretty useless as an audit trail.

You raise the question of soft deletes, which confuses the issue. That would only be relevant if you're considering having foreign keys between the two schemas e.g. StudentScoreHistory references StudentScore. That can be a valid design, but again, it suggests you don't trust your audit mechanism. Personally I would prefer to have hard deletes in the live tables, and record the fact of deletion in the History table. Soft deletes are just another source of grief.

But anyway this is a different question. It is perfectly possible to have foreign keys between the live and history versions of each table e.g. StudentScoreHistory -> StudentScore without also enforcing the relational integrity within the History schema e.g, StudentScoreHistory -> StudentHistory.

Unscathed answered 27/4, 2011 at 6:31 Comment(1)
I agree my live schema should enforce relational integrity. I'm worrying more about utility of having a FK StudentScoreHistory -> Student and StudentScoreHistory -> Course - some how this (SSH -> S) smells of unclean design.Feeding
C
2

Being in the midsts of implementing very similar auditing system for the first time, I am currently facing that same concern. My opinion echoes that of BiggsTRC - your "live" table maintains the FK relation to the Course record and your history table only maintains relation to its "live" counterpart (StudentScore). This, I think, achieves not having orphans in the audit table.

Now, there is something else I did not see mentioned in the answers: in our current project, we saw value of maintaining a FK in the history table to the CourseHistory table, so that we know, what was the "state" of the Course record at time of the StudentScoreHistory audit entry. Of course that may or may not matter to you, depending on your system logic.

Our solution to your concern (in your answer to BiggsTRC), that you might have the same CourseId several times was to reference not the actual CourseId, but the PK column of CourseHistory table. We still don't have a firm decision how to accomplish this - whether we want to create audit entry of the Course record even if there was not a change, or try to introduce some logic to look-up the CourseHistory record that matches relevant Course state at time of StudentScoreHistory entry.

Clemenceau answered 19/5, 2011 at 4:53 Comment(0)
M
1

If you only plan to do soft deletes as you describe then i see no reason why you shouldn't use foreign keys.

Moreno answered 26/4, 2011 at 18:7 Comment(2)
Wouldn't it be making the design more rigid for no apparent benefit? Also, conceptually auditing should not be interfering with rest of the system. Without the foreign keys in audit table, we would be making one less assumption (soft deletes) about the system.Feeding
Don't think of your audit/history as separate to the system: it is part of the system. If you want to keep an accurate history then FKs in the history tables are useful, just as FKs in the live data tables are useful.Wisteria
F
0

I wouldn't create a second set of tables for the 'audited' rows, just integrate your auditing functions into your existing production schema. It sounds like your purpose isn't backup and restore as of a given date/disaster, but trace history of changes per user or student, and that's a function of your application. I think your additional fields are fine, they just don't need to be added to another set of tables.

One problem with backup and restore processes is schema changes. Schemas tend to change over time which means you may not be able to restore directly from a backup. If you keep your auditing functions built into your production schema, you don't have to worry about breaking anything when you need to support additional functions.

Favourable answered 17/5, 2011 at 14:55 Comment(5)
Whether or not auditing is a function of the application, or just extra peace of mind for the DBA/security admins, really depends on the application... It sounds like this is just extra database-side activity that any front-end application components have no visibility to. All that said, it's worth considering whether it should be a function of the application!Indophenol
@Tao, front end applications would have limited visibility to the audit tables - in form of audit trail reports etc. Though, it would definitely not be a function of application (if you want to look at auditing as a separate independent module).Feeding
@Favourable - we would be keeping the audit tables in the same db as main tables. Also, any schema changes to main table would be replicated to history tables as well. Do you mind elaborating on 'integrate your auditing functions into your existing production schema' - do you mean keep production and historical data in same tables? How do you maintain proper indexing and pks? Wouldn't it be an overhead on main application?Feeding
It could be overhead on the main application, if you're expecting frequent backups. Yes, the idea would be to keep the audit detail in the same tables as the main ones with marker fields to distinguish them, like you do for soft deletes. Indexing and PKs should be ok, unless you want to reuse a PK after it's been audited? In that case, your audit marker would need to be part of the PK.Favourable
If you need to restore audited data as if it was never removed, it's easier if you keep it all together. If it's ok to have IT perform some magic for the restore, or if the volume of audits noticeably affects performance, then, by all means, keep it separate. You can do whatever you want while the users aren't looking.Favourable

© 2022 - 2024 — McMap. All rights reserved.