I have a table that records a row for each time a score for a location has changed.
score_history:
- id int PK (uuid auto incrementing int)
- happened_at timestamp (when the score changed)
- location_id int FK (the location that the value is for)
- score float (the new score)
This was done with an eye on efficiency and being able to simply retrieve a list of changes for a given location and serves that purpose nicely.
I'm trying to output the data in a very redundant format to help load it into a rigid external system. The external system expects a row for each location * every date. The goal is to represent the last score value for each location for each date. So if the score changed 3 times in a given date only the score closest to midnight would be considered that locations closing score for the day. I imagine this is similar to the challenge of creating a close of business inventory level fact table.
I have a handy star schema style date dimension table which has a row for every date fully covering this sample period and well into the future.
That table looks like
dw_dim_date:
- date date PK
- a bunch of other columns like week number, is_us_holiday etc.
So, if I had only 3 records in the score_history table...
1, 2019-01-01:10:13:01, 100, 5.0
2, 2019-01-05:20:00:01, 100, 5.8
3, 2019-01-05:23:01:22, 100, 6.2
The desired output would be:
2019-01-01, 100, 5.0
2019-01-02, 100, 5.0
2019-01-03, 100, 5.0
2019-01-04, 100, 5.0
2019-01-05, 100, 6.2
3 Requirements:
- One row per day per location even if there are no score records for that day.
- If there are score records for that day the last one before midnight should be the score value for the row. In the event of a tie the greater of the two should "win".
- If there are zero score records for that day the score should be the most recent previous score.
I've been chasing my tail through subqueries and window functions.
Because I'm hesitant to post something without something I tried I'll share this trainwreck which produces output but of no meaning...
SELECT dw_dim_date.date,
(SELECT score
FROM score_history
WHERE score_history.happened_at::DATE < dw_dim_date.date
OR score_history.happened_at::DATE = dw_dim_date.date
ORDER BY score_history.id desc limit 1) as last_score
FROM dw_dim_date
WHERE dw_dim_date.date > '2019-06-01'
Grateful for guidance or pointers to other questions to read.