How do I reconstruct a historical view?
Asked Answered
S

3

10

I am currently exploring Change Data Capture as an option to store temporal databases. It is great because it stores only the deltas and seems like it may solve my problem. When I enabled CDC, a bunch of tables appeared under System Tables.

When querying cdc.dbo_MyTable, I am able to see all the changes that took place on the table. Now, how would I construct a historical view? For instance, if I wanted to see the state of my table as of a particular date, how would I go about doing that? Is that even possible?

It looks I need to take the log and start applying it over my original table but I was wondering if there is a built-in way of doing this. Any suggestions?

Some of the use cases I am looking at:

  • Know the state of the graph at a particular point in time
  • Given two graphs at different times, know the set of links that are different (this can probably be obtained using an EXCEPT clause after constructing the tables)
Sicanian answered 13/8, 2012 at 0:5 Comment(2)
what do you mean you want to compute the diff? example output?Deduce
@Jaguar: Added two use cases. Thanks!Sicanian
D
6

it's possible, but not with a built-in way i'm a afraid. You would have to reconstruct the timeline by hand.

Given that the change-tracking tables offer the tran_end_time, which is the time that the value of the property should be perceived as persisted, you would have to make a query that fetches all the distinct periods of table states, join on the tracked property changes and then pivot (to have a presentation in the same form as the table). Don't forget to union with the table state itself to obtain the values that have not been changed/tracked for completeness.

The final result, simplified, should look like

RN PK PropA   PropB   FromDate          ToDate
1  1  'Ver1'  'Ver1'  2012-01-01 09:00  2012-01-02 08:00
2  1  'Ver1'  'Ver2'  2012-01-02 08:00  2012-01-03 07:00
3  1  'Ver2'  'Ver2'  2012-01-03 07:00  *getdate()*
4  2  'Ver1'  'Ver1'  2012-01-01 05:00  2012-01-02 06:00
5  2  'Ver1'  'Ver2'  2012-01-02 06:00  2012-01-03 01:00
6  2  'Ver2'  'Ver2'  2012-01-03 01:00  *getdate()*

note that the getdate() is valid if the row wasn't deleted in which case it should be substituted with the deletion date

EDIT, for the 2 use cases. The first point is easily addressed it's a matter of constructing the temporal object graph and then filtering:

declare @pointInTime datetime = '20120102 10:00';
select * from Reconstructed_TG where FromDate <= @pointInTime and @pointInTime < ToDate

the second point, can be generated easily with the EXCEPT clause, as you point out. given the above query:

declare @pointInTimeA datetime = '20120102 10:00';
declare @pointInTimeB datetime = '20120103 01:00';
select * from Reconstructed_TG where FromDate <= @pointInTimeA and @pointInTimeA < ToDate
EXCEPT
select * from Reconstructed_TG where FromDate <= @pointInTimeB and @pointInTimeB < ToDate

yet the except clause only presents the rows that have at least one different column value; i don't know if that information is really meaningful to the human eye. Depending on your needs a query that works directly on the cdc data may be more appropriate.

Deduce answered 14/8, 2012 at 12:44 Comment(1)
+1 Thank you for your time and effort! I will explore this deeper and adapt it. Awarded the bounty! :)Sicanian
V
0

You may want to check out Snapshots, which have been built in to SQL Server since 2005.

These will be most useful to you if you only need a few timepoints, but they can help you track all of the tables in a complex database.

These are deltas, so Compared to a full copy of a database, however, snapshots are highly space efficient. A snapshot requires only enough storage for the pages that change during its lifetime. Generally, snapshots are kept for a limited time, so their size is not a major concern.

Valladolid answered 14/8, 2012 at 16:53 Comment(1)
Snapshots can be useful if you know ahead of time the times at which you want to keep a copy of the data. If you're wondering "what was the state at 2pm yesterday", and you didn't take a snapshot at 2pm yesterday, then they're not going to help.Elisa
A
0

I'm not sure about this, never done anything like that, but maybe you can add a column "changeset" to the table that can keep track of the changes you have on the table, every time there's a transaction get the max(changeset) and save the new cahnges with the next value... Or if you have a timestamp and want to know the status of your table at certain time do querys to filter changes previous to the date you want to check... (Not sure if I should write this is as an answer or a comment... I'm new here)

Anyway, hope it helps...

Aargau answered 17/8, 2012 at 1:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.