audit table vs. Type 2 Slowly Changing Dimension
Asked Answered
C

1

9

In SQL Server 2008+, we'd like to enable tracking of historical changes to a "Customers" table in an operational database.

It's a new table and our app controls all writing to the database, so we don't need evil hacks like triggers. Instead we will build the change tracking into our business object layer, but we need to figure out the right database schema to use.

The number of rows will be under 100,000 and number of changes per record will average 1.5 per year.

There are at least two ways we've been looking at modelling this:

  1. As a Type 2 Slowly Changing Dimension table called CustomersHistory, with columns for EffectiveStartDate, EffectiveEndDate (set to NULL for the current version of the customer), and auditing columns like ChangeReason and ChangedByUsername. Then we'd build a Customers view over that table which is filtered to EffectiveEndDate=NULL. Most parts of our app would query using that view, and only parts that need to be history-aware would query the underlying table. For performance, we could materialize the view and/or add a filtered index on EffectiveEndDate=NULL.

  2. With a separate audit table. Every change to a Customer record writes once to the Customer table and again to a CustomerHistory audit table.

From a quick review of StackOverflow questions, #2 seems to be much more popular. But is this because most DB apps have to deal with legacy and rogue writers?

Given that we're starting from a blank slate, what are pros and cons of either approach? Which would you recommend?

Cripple answered 20/6, 2014 at 20:35 Comment(2)
It's an OLTP database not a separate data warehouse, but the table in question doesn't change very often.Cripple
I imagine that a common operation in the application will be showing a list of given customer's transactions. SCD 2 will make an additional join necessary every time - CustomersCurrentView WHERE Customer = 'John Doe' JOIN CustomersHistory JOIN Transactions. My suggestion is - if historical data isn't used often, keep in a set of separate audit table; consider SCD 2 only if history-aware components constitute an important piece of the application. +1 for a very interesting question!Consentaneous
T
3

In general, the issue with SCD Type- II is, if the average number of changes in the values of the attributes are very high, you end-up having a very fat dimension table. This growing dimension table joined with a huge fact table slows down the query performance gradually. It's like slow-poisoning.. Initially you don't see the impact. When you realize it, it's too late!

Now I understand that you will create a separate materialized view with EffectiveEndDate = NULL and that will be used in most of your joins. Additionally for you, the data volume is comparatively low (100,000). With average changes of only 1.5 per year, I don't think data volume / query performance etc. are going to be your problem in the near future.

In other words, your table is truly a slowly changing dimension (as opposed to a rapidly changing dimension - where your option #2 is a better fit). In your case, I will prefer option #1.

Tomahawk answered 28/6, 2014 at 4:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.