Min/Max Date Values over Large Date Range depending on Value
Asked Answered
A

1

8

I'm querying a snapshot of customer data that contains the snapshot date, the customer ID and the 'value' of that customer on that day. I use the LAG function to return the previous days value to know if there is a drop/rise/complete loss/complete new value (from £0 to > £0).

The end game is to identify the min and max dates where the customer was at £0 value.

Originally I tried MIN(Date) and Max(Date) grouping by the Customer and Value. However if a customer dropped to £0 over different date ranges, it would bring back the max of the latest date range and the min of the earliest, instead of the ideal - bring back both ranges where it was £0.

I've tried using DENSE_RANK() to split each the values of the customer, but doing so just ranks all £0 values in the same rank.

Here is some sample code to show you the data I'm working with and how i've tried to split it:

DROP TABLE IF EXISTS #SnapshotTable
CREATE TABLE #SnapshotTable
(
    Row_ID INT IDENTITY(1,1)
    ,SnapshotDate DATE
    ,SnapshotDateKey INT
    ,CustomerId INT
    ,Value DECIMAL(18,2)
)
INSERT INTO #SnapshotTable (SnapshotDate, SnapshotDateKey, CustomerId, Value)
SELECT '2019-01-01', 20190101, 1, 0.00
UNION SELECT '2019-01-02', 20190102, 1, 0.00
UNION SELECT '2019-01-03', 20190103, 1, 5.00
UNION SELECT '2019-01-04', 20190104, 1, 5.00
UNION SELECT '2019-01-05', 20190105, 1, 3.00
UNION SELECT '2019-01-06', 20190106, 1, 3.00
UNION SELECT '2019-01-07', 20190107, 1, 0.00
UNION SELECT '2019-01-08', 20190108, 1, 0.00
UNION SELECT '2019-01-09', 20190109, 1, 10.00
UNION SELECT '2019-01-10', 20190110, 1, 0.00

SELECT * FROM #SnapshotTable

-- Code that doesn't work correctly
SELECT
    CustomerId
    ,Value
    ,MinDate = MIN(SnapshotDateKey)
    ,MaxDate = MAX(SnapshotDateKey)
FROM #SnapshotTable
GROUP BY
    CustomerId
    ,Value

-- Attempted with dense rank
ALTER TABLE #SnapshotTable
ADD DenseRankTest INT NULL
GO
-- Update with Dense Rank
UPDATE TGT
SET 
    TGT.DenseRankTest = SRC.NewRank
FROM #SnapshotTable TGT
INNER JOIN (SELECT
                Row_ID
                ,NewRank = DENSE_RANK() OVER (PARTITION BY CustomerId ORDER BY Value ASC)
            FROM #SnapshotTable

            ) AS SRC
    ON SRC.Row_ID = TGT.Row_ID 

SELECT * FROM #SnapshotTable

Now I can see that the dense_rank() function is kind of functioning how I want it to but honestly I've been looking at this for a while now and I cannot get my head around how to do it correctly.

Can somebody please advise on what I need to do?

I'm expecting to see:

SELECT [StartDateKey] = 20190101, [EndDateKey] = 20190102, [CustomerId] = 1, [Value] = 0
UNION SELECT [StartDateKey] = 20190103, [EndDateKey] = 20190104, [CustomerId] = 1, [Value] = 5
UNION SELECT [StartDateKey] = 20190105, [EndDateKey] = 20190106, [CustomerId] = 1, [Value] = 3
UNION SELECT [StartDateKey] = 20190107, [EndDateKey] = 20190108, [CustomerId] = 1, [Value] = 0
UNION SELECT [StartDateKey] = 20190109, [EndDateKey] = 20190109, [CustomerId] = 1, [Value] = 10
UNION SELECT [StartDateKey] = 20190120, [EndDateKey] = 20190110, [CustomerId] = 1, [Value] = 0

Edit: For those who stumble across this, with the help of the people here I've found this as a good read for understanding the issue/solving the issue.

Awe answered 10/7, 2019 at 13:38 Comment(5)
Excellent job posting data and details. Quite impressive for your first post and something that so many people can't figure out. This sounds like a gaps and islands problem. There are quite a few examples around SO about ways to tackle this.Cherie
There is one thing confusing about your question. You state several times you want to find rows where the value is 0. But in your desired result, you also group two consecutive rows where the value is 5. This is contradictory to the description.Fulbright
@TabAlleman the desired output makes that a lot more clear. They want the min and max dates for each group of Values. At least that is how I interpret this.Cherie
@TabAlleman Apologies, I should I have clearer. You're right on that point, that's the desired output however I was interesting in seeing the intermediate output that I will then filter WHERE Value = 0. Thank you.Awe
@SeanLange I've never heard of "Gaps and Islands" before so I honestly wasn't sure on what to search Google/Stack! Thanks for the help, I'll get reading.Awe
O
2

This is a gaps-and-islands problem. But the accepted answer on the purported duplicate is simply not the best way to approach this. And the more highly voted answer is still overly complicated.

A much simpler method is:

select customerid, value, min(SnapshotDateKey), max(SnapshotDateKey)
from (select st.*,
             row_number() over (partition by customerid, value order by snapshotdate) as seqnum
      from snapshottable st
     ) st
group by dateadd(day, -seqnum, snapshotdate), customerid, value
order by min(SnapshotDateKey);

Here is a db<>fiddle.

Outskirts answered 10/7, 2019 at 14:16 Comment(1)
Thank you! This is perfect, I'm going to have a read up on Gaps and Islands in case I can help/come across this in the future. Nice work using dateadd, I hadn't thought of that. Thank you again!Awe

© 2022 - 2024 — McMap. All rights reserved.