Is there a concept of slowly changing FACT in data warehouse
Asked Answered
S

1

15

In data warehousing, we have the concept of slowly changing dimensions. I am just wondering why there is no jargon for 'slowly/rapidly changing FACTs' because the same Type1, Type 2 measures can be used to track changes in the FACT table.

Swearword answered 6/2, 2014 at 15:16 Comment(7)
I don`t think that this concept would have sense. Fact are always changing, are the result of some measurement, some kind of outcome. We do track automatically the changing of the facts. Not so in dimensions, that are (roughly speaking) the label that identify the facts. So a changing in meaning of a label could be a problem, so we could need to track these "changing dimension".Pouch
@Pouch What about corrections? Say updated order quantities and similarChristcross
Correction could be handled updating the fact, when it's a simple correction of an error (may be performance-heavy). Or adding a corrective factor. Preserve the correction histories is also possible (i see your point now) but I`ve never done it. I think another time dimension with the time of correction should be the way to go.Pouch
@Pouch In the case of an orders fact table with date and product dimension and sales_quantity, sales_amount measure. The source transaction table has a flag column indicates the status of an order (paid, cancelled, refund, etc) could change by time, an order of yesterday when we do ETL it was paid, and today it's cancelled. How could we handle this kind of data? I think this is a kind of fact that changes.Mender
Status could be its own dimension. The fact (order quantity and amount) will be identified at any given moment also by status.Pouch
@Pouch so you mean the same order(order_id, date, product) will appear in the order fact as multi rows, and each row indicates a different status? I could understand this. But in our case, there are no history records in the source database, which means one order will only have one row in the system. And only a flag column indicates the status, once the order was shipped, we will not know when the order was created any more.Mender
I am not sure to understand. You are not limited to what is in your source system at any given moment. You could have rows in the fact table that represent the statuses at different times. Or you could have different measures in your fact table (see "accumulating snapshot"). It depend on your design.Pouch
M
15

According to the DW gods there are 3 types of FACT tables

  • Transaction: your basic measurements with dim references. measurements not rolled up or summarized, lots of DIM relations
  • Periodic: Rolled up summaries of transaction fact tables over a defined period of time.
  • Accumulating Snapshot: measurements associated with a 2+ defined time periods

From these we have at least 2 options that will result in something pretty similar to a slowly changing fact table. It all depends on how your source system is set up.

Option 1: Transactional based Source System

If your source system tracks changes to measurements via a series of transactions (ie, initial value + change in value + change value etc) then each of these transactions ends up in the transactional fact. This is then used by the periodic fact table to reflect the 'as of period' measures.

For example, if your source system tracks money in and out of an account you would probably have a transaction fact table that pretty much mirrored the source money in/out table. You would also have a periodic fact table that would be updated every period (in this case month) to reflect the total value of the account for that period

The periodic fact table is your Slowly Changing Fact table.

Source               DW_Tansaction_Fact       DW_Periodic_Fact
---------------  ->  -------------------  ->  --------------------
Acnt1 Jan +10$       Acnt1 Jan +10$           Acnt1 Jan 10$
Acnt1 Feb -1 $       Acnt1 Feb -1 $           Acnt1 Feb  9$
Acnt1 Apr +2 $       Acnt1 Apr +2 $           Acnt1 Mar  9$
                                              Acnt1 Apr 11$

Option 2: CRUD/Overwriting Source System

Its more likely you have a source system that lets users directly update/replace the business measurements. At any point in time, according to the source system, there was and is only one value for each measure. You can make this transaction by some clever trickery in your ETL process but your only ever going to get a transaction window limited by your ETL schedule.

In this case you could go either with a Periodic Fact table OR an Accumulating fact table.

Lets stick with our account example, but instead of transactions the table just stores an amount value against each account. This is updated as required in the source system so that for Acnt1, in January it was 10$, February 9$ and April 11$

Sticking the the transaction and period fact tables we would end up with this data (As at end of April). Again, The periodic fact table is your Slowly Changing Fact table.

DW_Tansaction_Fact       DW_Periodic_Fact
 -------------------  ->  --------------------
Acnt1 11$                 Acnt1-Jan-10$
                          Acnt1-Feb-09$
                          Acnt1-Mar-09$
                          Acnt1-Apr-11$

But we could also go with with an Accumulating Fact table which could contain all month values for a given year.

DW_Accumlative_Fact_CrossTab
Year Acnt  Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2001 Acnt1 10  9   9   11  -   -   -   -   -   -   -   -

Or a more type3-ish version

DW_Accumlative_Fact_CrossTab
Acnt    Year   YearStartVal   CurrentVal
Acnt1   2001   10             9

Kindof relevant

In my experience, this sort of question comes up when this common business scenario:

  1. There is a Core Business System with a DATABASE.
  2. Business Periodically Issues Reports that summaries values by time periods from Core Business System
  3. Core Business System allows retrospective updating of data - This is handled by overwriting values.
  4. Business demands to know why the January figures in the same report run in June no longer match the January figures from the report run in February.

Note that you are now dealing with FOUR sets of time (Initial period of report, measurement at date of initial period, current report period, measurement at current period) which will be hard enough for you to explain let alone your end users to understand.

Try to step back, explain to your end users which business measures change over time, listen to what results they want and build your facts accordingly. Note that you may end up with multiple fact tables for the same measure, that is OK and good.

Reference:

Multiple answered 5/7, 2016 at 5:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.