How do I reconstruct a historical view?

Asked 13/8, 2012 at 0:5 Answered 17/8, 2012 at 1:1

Solved sql sql-server sql-server-2008 cdc change-data-capture

I am currently exploring Change Data Capture as an option to store temporal databases. It is great because it stores only the deltas and seems like it may solve my problem. When I enabled CDC, a bunch of tables appeared under System Tables.

When querying cdc.dbo_MyTable, I am able to see all the changes that took place on the table. Now, how would I construct a historical view? For instance, if I wanted to see the state of my table as of a particular date, how would I go about doing that? Is that even possible?

It looks I need to take the log and start applying it over my original table but I was wondering if there is a built-in way of doing this. Any suggestions?

Some of the use cases I am looking at:

Know the state of the graph at a particular point in time
Given two graphs at different times, know the set of links that are different (this can probably be obtained using an EXCEPT clause after constructing the tables)

Sicanian answered 13/8, 2012 at 0:5 Comment(2)

what do you mean you want to compute the diff? example output? – Deduce 20/8, 2012 at 7:48

@Jaguar: Added two use cases. Thanks! – Sicanian 20/8, 2012 at 22:19

it's possible, but not with a built-in way i'm a afraid. You would have to reconstruct the timeline by hand.

Given that the change-tracking tables offer the tran_end_time, which is the time that the value of the property should be perceived as persisted, you would have to make a query that fetches all the distinct periods of table states, join on the tracked property changes and then pivot (to have a presentation in the same form as the table). Don't forget to union with the table state itself to obtain the values that have not been changed/tracked for completeness.

The final result, simplified, should look like

RN PK PropA   PropB   FromDate          ToDate
1  1  'Ver1'  'Ver1'  2012-01-01 09:00  2012-01-02 08:00
2  1  'Ver1'  'Ver2'  2012-01-02 08:00  2012-01-03 07:00
3  1  'Ver2'  'Ver2'  2012-01-03 07:00  *getdate()*
4  2  'Ver1'  'Ver1'  2012-01-01 05:00  2012-01-02 06:00
5  2  'Ver1'  'Ver2'  2012-01-02 06:00  2012-01-03 01:00
6  2  'Ver2'  'Ver2'  2012-01-03 01:00  *getdate()*

note that the getdate() is valid if the row wasn't deleted in which case it should be substituted with the deletion date

EDIT, for the 2 use cases. The first point is easily addressed it's a matter of constructing the temporal object graph and then filtering:

declare @pointInTime datetime = '20120102 10:00';
select * from Reconstructed_TG where FromDate <= @pointInTime and @pointInTime < ToDate

the second point, can be generated easily with the EXCEPT clause, as you point out. given the above query:

declare @pointInTimeA datetime = '20120102 10:00';
declare @pointInTimeB datetime = '20120103 01:00';
select * from Reconstructed_TG where FromDate <= @pointInTimeA and @pointInTimeA < ToDate
EXCEPT
select * from Reconstructed_TG where FromDate <= @pointInTimeB and @pointInTimeB < ToDate

yet the except clause only presents the rows that have at least one different column value; i don't know if that information is really meaningful to the human eye. Depending on your needs a query that works directly on the cdc data may be more appropriate.

Deduce answered 14/8, 2012 at 12:44 Comment(1)

+1 Thank you for your time and effort! I will explore this deeper and adapt it. Awarded the bounty! :) – Sicanian 21/8, 2012 at 23:21

You may want to check out Snapshots, which have been built in to SQL Server since 2005.

These will be most useful to you if you only need a few timepoints, but they can help you track all of the tables in a complex database.

These are deltas, so Compared to a full copy of a database, however, snapshots are highly space efficient. A snapshot requires only enough storage for the pages that change during its lifetime. Generally, snapshots are kept for a limited time, so their size is not a major concern.

Valladolid answered 14/8, 2012 at 16:53 Comment(1)

Snapshots can be useful if you know ahead of time the times at which you want to keep a copy of the data. If you're wondering "what was the state at 2pm yesterday", and you didn't take a snapshot at 2pm yesterday, then they're not going to help. – Elisa 15/8, 2012 at 6:41

I'm not sure about this, never done anything like that, but maybe you can add a column "changeset" to the table that can keep track of the changes you have on the table, every time there's a transaction get the max(changeset) and save the new cahnges with the next value... Or if you have a timestamp and want to know the status of your table at certain time do querys to filter changes previous to the date you want to check... (Not sure if I should write this is as an answer or a comment... I'm new here)

Anyway, hope it helps...

Aargau answered 17/8, 2012 at 1:1 Comment(0)

Recommended topics

Hot tags