How to identify the first gap in multiple start and end date ranges for each distinct member in T-SQL
Asked Answered
G

3

6

I have been working on the below but getting no results and the deadline is fast approaching. Also, there are over a million rows as the below. Appreciate your help on the below.

Objective: Group results by MEMBER and build Continuous Coverage Ranges for each Member by combining individual Date Ranges which either overlap or run consecutive to each other with no breaks between the Start & End day of the range.

I have data in the below format:

MemberCode  -----   ClaimID   -----       StartDate   -----       EndDate
00001   -----       012345   -----       2010-01-15   -----       2010-01-20
00001   -----       012350   -----       2010-01-19   -----       2010-01-22
00001   -----       012352   -----       2010-01-20   -----       2010-01-25
00001   -----       012355   -----       2010-01-26   -----       2010-01-30
00002   -----       012357   -----       2010-01-20   -----       2010-01-25
00002   -----       012359   -----       2010-01-30   -----       2010-02-05
00002   -----       012360   -----       2010-02-04   -----       2010-02-15
00003   -----       012365   -----       2010-02-15   -----       2010-02-30

...

In the above the member (00001) is a valid member as there is a continuous date range from 2010-01-15 to 2010-01-30 (with no gaps). Please note that the Claim ID 012355 for this member starts immediately next to the End Date of Claim ID 012352. This is still valid as it forms a continuous range.

However, the member (00002) should be an Invalid member as there is a gap of 5 days between Enddate of Claim ID 012357 and Start Day for Claim ID 012359

What I am trying to do is get a list of ONLY those members who have claims for every single day of the continuous date range (for each member) with no gaps between the MIN(Start-date) and Max(End Date) for each Distinct member. Members who have gaps are discarded.

Thanks in advance.

UPDATE:

I have reached until here. Note: FILLED_DT = Start Date & PresCoverEndDT = End Date

SELECT PresCoverEndDT, FILLED_DT 

FROM 

(

    SELECT DISTINCT FILLED_DT, ROW_NUMBER() OVER (ORDER BY FILLED_DT) RN

    FROM Temp_Claims_PRIOR_STEP_5 T1

    WHERE NOT EXISTS 

            (SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2

            WHERE T1.FILLED_DT > T2.FILLED_DT AND T1.FILLED_DT< T2.PresCoverEndDT 

            AND T1.MBR_KEY = T2.MBR_KEY )

) T1

    JOIN (SELECT DISTINCT PresCoverEndDT, ROW_NUMBER() OVER (ORDER BY PresCoverEndDT) RN

        FROM Temp_Claims_PRIOR_STEP_5 T1

        WHERE NOT EXISTS 

            (SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2

             WHERE T1.PresCoverEndDT > T2.FILLED_DT AND T1.PresCoverEndDT < T2.PresCoverEndDT AND T1.MBR_KEY = T2.MBR_KEY )
) T2

     ON T1.RN - 1 = T2.RN

WHERE   PresCoverEndDT < FILLED_DT 

The above code seems to have error as I am getting only one row and that too it is incorrect. My desired output is only 1 column as below:

Valid_Member_Code

00001

00007

00009

... etc.,

Grapple answered 29/8, 2012 at 12:27 Comment(9)
Have a look here: simple-talk.com/sql/t-sql-programming/…Maggs
possible duplicate of islands and gaps tsqlStricker
similar topic discussed here: https://mcmap.net/q/1778493/-merging-unused-timeslots/…Bugbee
@Chris Gessler - Sorry if the following question is naive. However, my requirement was for each distinct member. How do I use the steps in the link to find the date ranges for each distinct member? Thanks.Grapple
@Grapple - if you want to find invalid MemberCodes, you can self join and look for an t1.EndDate > t2.StartDate. To find all the date gaps, you'll have to partition your results. Are you using SQL Server by chance?Maggs
@Chris - I am using SQL server 2008 R2.Grapple
@Grapple - I smell a CTE + ROW_NUMBER OVER PARTITION solution coming... I can't do the work myself right now, but if I have time later and nobody has come up with a solution, I'll see what I can do.Maggs
Thanks Chris. I have reached until the below (updating the original post).Grapple
Well, the code I had posted above is wrong, I suppose. Not getting the desired output.Grapple
F
5

Try this: http://www.sqlfiddle.com/#!3/c3365/20

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);

See query progression here: http://www.sqlfiddle.com/#!3/c3365/20


How it works, compare the current end date to its next start date and check the date gap:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1;

Output:

| MEMBERCODE |  STARTDATE |    ENDDATE | NEXTSTARTDATE | GAP |
--------------------------------------------------------------
|          1 | 2010-01-15 | 2010-01-20 |    2010-01-19 |  -1 |
|          1 | 2010-01-19 | 2010-01-22 |    2010-01-20 |  -2 |
|          1 | 2010-01-20 | 2010-01-25 |    2010-01-26 |   1 |
|          2 | 2010-01-20 | 2010-01-25 |    2010-01-30 |   5 |
|          2 | 2010-01-30 | 2010-02-05 |    2010-02-04 |  -1 |

Then check if a member has same count of claims with no gaps to its total claims:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode, count(*) as count, sum(case when gap <= 1 then 1 end) as gapless_count
from gaps
group by membercode;

Output:

| MEMBERCODE | COUNT | GAPLESS_COUNT |
--------------------------------------
|          1 |     3 |             3 |
|          2 |     2 |             1 |

Finally, filter them, members with no gaps in their claims:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);

Output:

| MEMBERCODE |
--------------
|          1 |

Do note that you don't need to do COUNT(*) > 1 to detect members with 2 or more claims. Instead of using LEFT JOIN, we uses JOIN, this will automatically discard members who have yet to have a second claim. Here's the version(longer) if you opt to use LEFT JOIN instead(same output as above):

with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
and count(*) > 1; -- members who have two ore more claims only

Here's how see data of above query prior to filtering:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select * from gaps;

Output:

| MEMBERCODE |  STARTDATE |    ENDDATE | NEXTSTARTDATE |    GAP |
-----------------------------------------------------------------
|          1 | 2010-01-15 | 2010-01-20 |    2010-01-19 |     -1 |
|          1 | 2010-01-19 | 2010-01-22 |    2010-01-20 |     -2 |
|          1 | 2010-01-20 | 2010-01-25 |    2010-01-26 |      1 |
|          1 | 2010-01-26 | 2010-01-30 |        (null) | (null) |
|          2 | 2010-01-20 | 2010-01-25 |    2010-01-30 |      5 |
|          2 | 2010-01-30 | 2010-02-05 |    2010-02-04 |     -1 |
|          2 | 2010-02-04 | 2010-02-15 |        (null) | (null) |
|          3 | 2010-02-15 | 2010-03-02 |        (null) | (null) |

EDIT on requirement clarification:

On your clarification, you wanted to include members who have yet to have second claim too, do this instead: http://sqlfiddle.com/#!3/c3365/22

with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
-- members who have yet to have a second claim are valid too
or count(nextstartdate) = 0; 

Output:

| MEMBERCODE |
--------------
|          1 |
|          3 |

The technique is to count the member's nextstartdate, if they have no next start date date(i.e. count(nextstartdate) = 0) then they are single claims only and valid too, then just attach this OR condition:

or count(nextstartdate) = 0; 

Actually, the condition below will suffice too, I wanted to make the query more self-documenting though, hence I recommend counting on member's nextstartdate. Here's an alternative condition for counting members who have yet to have a second claim:

or count(*) = 1;

Btw, we also have to change the comparison from this:

sum(case when gap <= 1 then 1 end) = count(*)

to this(as we are using LEFT JOIN now):

sum(case when gap <= 1 then 1 end) = count(gap)
Fibro answered 29/8, 2012 at 13:19 Comment(4)
Thanks a lot Michael! The updated query returned 37052 valid members from a total of 1.67 million claims. The query progression and your explanation of the steps were very educative :-)Grapple
Hi Michael, since I also need members who have yet to have a second claim (in other words, members with single claims are valid in my case), I used LEFT JOIN instead of JOIN based on your note at the end, above. However, I am getting 0 results. Wondering if I missed something here?Grapple
@Grapple Answer amended based on your clarified requirement ツFibro
Thanks Michael :-) I was trying to figure out why I was getting 0. The modified code did give me the desired output when I randomly checked the members. Also, thanks for the excellent detailed explanation and the query progression!Grapple
P
1

Try this, it partitions rows by MemberCode and gives them ordinal numbers. Then it compares rows with subsequent num value, if difference between end date of a row and start date of a next row is greater than one day, it's an invalid member:

DECLARE @t TABLE (MemberCode  VARCHAR(100), ClaimID   
    INT,StartDate   DATETIME,EndDate DATETIME)
INSERT @t
VALUES
('00001'   ,       012345   ,        '2010-01-15'   ,       '2010-01-20')
,('00001'   ,       012350   ,       '2010-01-19'   ,       '2010-01-22')
,('00001'   ,       012352   ,       '2010-01-20'   ,       '2010-01-25')
,('00001'   ,       012355   ,       '2010-01-26'   ,       '2010-01-30')
,('00002'   ,       012357   ,       '2010-01-20'   ,       '2010-01-25')
,('00002'   ,       012359   ,       '2010-01-30'   ,       '2010-02-05')
,('00002'   ,       012360   ,       '2010-02-04'   ,       '2010-02-15')
,('00003'   ,       012365   ,       '2010-02-15'   ,       '2010-02-28')
,('00004'   ,       012366   ,       '2010-03-18'   ,       '2010-03-23')
,('00005'   ,       012367   ,       '2010-03-19'   ,       '2010-03-25')
,('00006'   ,       012368   ,       '2010-03-20'   ,       '2010-03-21')

;WITH tbl AS (

    SELECT  *,
            ROW_NUMBER() OVER (PARTITION BY MemberCode ORDER BY StartDate) 
                AS num
    FROM    @t
), invalid AS (

    SELECT  tbl.MemberCode
    FROM    tbl
    JOIN    tbl _tbl ON 
            tbl.num = _tbl.num - 1
    AND     tbl.MemberCode = _tbl.MemberCode
    WHERE   DATEDIFF(DAY, tbl.EndDate, _tbl.StartDate) > 1  
)

SELECT  MemberCode
FROM    tbl
EXCEPT
SELECT  MemberCode
FROM    invalid
Peninsula answered 29/8, 2012 at 13:26 Comment(7)
Thanks Ivan, I have executed the query on the same table that I executed Michael's code above. However, the results varied drastically. While your code returns 328,256 claims (for 112,077 DISTINCT members), Michael's code returns 37,052 valid members.Grapple
Thanks for trying it out, that's an interesting result. If you have any example data where query failed to exclude invalid members please post it, I'm curious to see where it fails.Benco
@Grapple I believe I found one difference between my and Michael's query, my query doesn't exclude certain rows because I believe they are valid, those are rows that have only single occurrence for a given MemberCode, I added couple of them to the sample input data; 0004, 0005, 0006. Do you consider such rows as valid or invalid?Benco
yes, a member code with only a single claim is a valid member.Grapple
In such case this query will return these members, that may be the reason why it returns more members.Benco
Hi Ivan, thanks for the reply. I think (just wondering about this one though) one more reason for the difference might be that Michael has used "b.rn = a.rn + 1" and you have used "_tbl.num - 1". I think since both of your queries Partition by member code and Order by StartDate, one code using +1 and the other code using -1 would surely give differential results. Right?Grapple
Basically, they both perform self-join, I wrote it like tbl.num = _tbl.num - 1. It would be the same if I had written _tbl.num = tbl.num + 1 which is equivalent to b.rn = a.rn + 1.Benco
J
0

I think your query gives back false negatives because it only checks the time interval between consecutive rows. In my opinion, it is possible that the gap is compensated by one of the previous lines. Let me give an example:

Row l: 2010-01-01 | 2010-01-31
Row 2: 2010-01-10 | 2010-01-15
Row 3: 2010-01-20 | 2010-01-25

Your code will report a gap between row 2 and row 3, while it is being filled by row 1. Your code will not detect this. You should use the MAX(EndDate) of all previous rows in the DATEDIFF function.

DECLARE @t TABLE (PersonID  VARCHAR(100), StartDate DATETIME, EndDate DATETIME)
INSERT @t VALUES('00001'   ,       '2010-01-01'   ,       '2010-01-17')
INSERT @t VALUES('00001'   ,       '2010-01-19'   ,       '2010-01-22')
INSERT @t VALUES('00001'   ,       '2010-01-20'   ,       '2010-01-25')
INSERT @t VALUES('00001'   ,       '2010-01-26'   ,       '2010-01-31')
INSERT @t VALUES('00002'   ,       '2010-01-20'   ,       '2010-01-25')
INSERT @t VALUES('00002'   ,       '2010-02-04'   ,       '2010-02-05')
INSERT @t VALUES('00002'   ,       '2010-02-04'   ,       '2010-02-15')
INSERT @t VALUES('00003'   ,       '2010-02-15'   ,       '2010-02-28')
INSERT @t VALUES('00004'   ,       '2010-03-18'   ,       '2010-03-23')
INSERT @t VALUES('00005'   ,       '2010-03-19'   ,       '2010-03-25')
INSERT @t VALUES('00006'   ,       '2010-01-01'   ,       '2010-04-20')
INSERT @t VALUES('00006'   ,       '2010-01-20'   ,       '2010-01-21')
INSERT @t VALUES('00006'   ,       '2010-01-25'   ,       '2010-01-26')

;WITH tbl AS (
    SELECT  
        *, ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY StartDate) AS num
    FROM    @t
), invalid AS (
    SELECT  tbl.PersonID 
    FROM    tbl
    JOIN    tbl _tbl ON 
            tbl.num = _tbl.num - 1 AND tbl.PersonID = _tbl.PersonID
    WHERE DATEDIFF(DAY, (SELECT MAX(tbl3.EndDate) FROM tbl tbl3 WHERE tbl3.num <= tbl.num AND tbl3.PersonID = tbl.PersonID), _tbl.StartDate) > 1  
)

SELECT  PersonID
FROM    tbl
EXCEPT
SELECT  PersonID
FROM    invalid
Jaunita answered 17/9, 2013 at 15:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.