date/time dimension
Asked Answered
A

3

11

I am designing a data warehouse and I have a sticky issue with time. The grain I need is hourly (to calculate aggregate counts of events per hour) and I also have to accommodate a shift pattern that does not conveniently fit inside a 24 hour period (in fact it is possible that 'blue' shift wont cover the same time of day for several days).

With this in mind I am contemplating one of 3 approaches

  1. a single time dimension with 175K rows in it.
  2. a snowflake time dimension with 7300 rows in a calendar dimension and 175k rows in a time dimension
  3. separate dimensions so that the fact table has foreign keys for the event date and for the event time.

I am tending towards approach 3 as it allows the small calendar dimension to be referenced separately in joins, but I would appreciate any thoughts.

Anesthetist answered 8/2, 2011 at 7:27 Comment(4)
How are your figures derived: I would have thought that any calendar dimension would be 8766 or 8784 (depending on whether you are using 365.25 * 24 or 366 * 24); equally I do not understand your 175k rows for the time dimension - it does not arise naturally from any view of time I have looked at?Oleg
I was approximating at 365 days * 20 years = 7300 rows and then the 175k was approximately 24 hours * 7300 rows.Anesthetist
Sorry, if my question looks stupid, but... what is 'blue' shift? Or at least what's the issue with the probability of its not covering the same time of day for several days?Hillari
It is an arbitrary name given to a 'shift'. You called easily swap the word 'shift' for 'team'. Our teams have a complex schedule rotation so that no team is doing the same time of day everyday.Anesthetist
U
6

Yes, manufacturing shifts are tricky and do change over time, often one shift starts day before, etc.

Keep in mind that there are two calendars here. One is the standard calendar and the other is the production calendar -- the shift belongs to the production calendar. In general, a day in production calendar may last more (or less) than 24 hours.

For example:

Part produced on Monday, 2011-02-07 23:45 may look like

TimeOfProduction = '2011-02-07 23:45'
DateKey = 20110207
TimeKey = 2345
ProductionDateKey = 20110208 (the first shift of the next day started at 22:00)
ProductionTimeKey = 145 (1 hour and 45 minutes of the current production date)     
ShiftKey = 1
ShiftTimeKey = 145 (1 hour and 45 minutes of the current shift)

So, my suggestion is:

  1. Plain Date Dimension (one row per date)
  2. Plain Time Dimension (one row per minute for 24 hours = 1440 rows + see note below)
  3. Shift Dimension -- type 2 dimension with rw_ValidFrom, (rw_ValidTo) , rw_IsCurrent
  4. Role-play the DateKey into ProductionDateKey
  5. Role-play the TimeKey into a ProductionTimeKey and ShiftTimeKey.
  6. Keep the TimeOfProduction (datetime) in the fact table too.
  7. During the ETL process, apply the current shift logic to attach ProductionDateKey, ProductionTimeKey, ShiftKey, ShiftTimeKey to each row of the factPart table.

Note that you may need to add extra rows to the Time Dimension if a production day can last more than 24 hours. It usually can if a local time is used and there is a daylight savings time jump.

So, the star may look something like this

enter image description here

Ultrasonic answered 8/2, 2011 at 13:4 Comment(0)
O
2

My £0.02 for what it is worth:

Assuming that there is no additional issue arising from consideration of the shift (@Andriy M's question):

I would tend to discount option 2 unless there is a specific benefit (performance, simplification of a class of query, etc.) you can see from adopting it. You do not describe any such benefit, so it seems that you are adding complexity for its own sake.

My personal preference would be for option 1 - conceptually the simplest, the most direct, and the (IMO) best fit to data warehouse approaches.

Option 3 has the advantages you mention, but I have the nagging suspicion that it covers two alternatives: in both the calendar dimension is as you describe it, but the choices for the time dimension are 175k rows, or 24. I cannot at present provide arguments for either of these alternatives, only a gut feeling that there are two such choices. If the shift issue IS relevant here, it might influence the choice between these alternatives (if they are genuine alternatives).

If you wish to take option 2 further, the alternatives set out for option 3 are also relevant.

Oleg answered 8/2, 2011 at 10:45 Comment(2)
The main benefit I see of option 2 would be to have a manually maintainable complex calendar table which remains at the date level while having a simpler time dimension that has 24 rows per date (25 or 23 on those horrible equinox days). the payoff would be having to join-through the time dimension everytime you want calendar information. So perhaps there should be option 1.5 that uses a view over a calendar table and a time table to provide a consolidated date dimension.Anesthetist
If the shift pattern is an issue, there is a third choice for the time dimensions mentioned for option 3. This is to have n rows, where n is the number of hours it takes the shift pattern to completely cycle - e.g., from shift starts at 09:00 Monday to shift starts at 09:00 Monday. This is subject to the same caveats as given in my original answer.Oleg
G
1

I would choose option 3. - Separate dimensions. Benefits:

  • Simplicity - two relatively small tables - with Time dimension loaded only once as there's fixed number of minutes in a day.

  • Reuse - two separete dimensions are more likely to be shared with other fact tables that can have only Date or Time dimension

  • Easy partitioning by having separate attribute for Date dimension in a fact table

  • Extensibility - think of attributes you could add to Date and Time dimensions as your reporting needs grow. For a Date dimension this could be (to avoid extracting this information each time from date): year, quarter, month, day, week, date label (like "12th September 2011"), month name, weekday name, various indicators (holiday indicator, end of quarter, end of month, etc.). For a Time dimension (which could - for accuracy - contain each second of a day) this could be: hour, minute, second, day part label (like "morning", "evening"), working time indicator (seconds from 8:00:00 to 17:00:00), etc. But having it all in just one dimension would mean a lot of redundancy.

Shifts that are not aligned with day start / end look to me as a good candidate for a separate fact fable recording start and end timestamp for each shift - I mean (factless) fact table with the following foreign keys: id_date_start, id_time_start, id_date_end, id_time_end. Then you can "drill-across" from the events fact table to the shifts table to get aggregate results for each shift.

Edit: Or model shifts just as another dimension - that depends on the fact if for you shift is an important business process that you want to track independently with its attributes (but at the moment I can't think of any other attributes then Date & Time... Location, perhaps?) or if it's just a context of an event (and therefore should be just a dimension).

Gravitative answered 28/3, 2011 at 12:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.