T-SQL - Track occurrences over time
Asked Answered
E

1

3

I have some data which has ValidFrom and ValidTo dates associated with it. In simple terms:

MembershipId | ValidFromDate | ValidToDate
==========================================
0001         | 1997-01-01    | 2006-05-09
0002         | 1997-01-01    | 2017-05-12
0003         | 2005-06-02    | 2009-02-07

There is a non-clustered index on this table which includes the two dates as key values.

I also have a Date dimension table which covers every date from 1900 to 2999.

I'm trying to figure out how I can select a range of dates from the Date dimension table (let's say 2016-01-01 to 2016-12-31) and then identify, for each date, how many memberships were valid on that date.

The code below does the job but the performance isn't great and I was wondering whether anyone has any recommendations for a better way to go about this?

SELECT 
   d.DateKey
  ,(SELECT COUNT(*) FROM Memberships AS m
    WHERE d.DateKey between m.ValidFromDateKey and m.ValidToDateKey
    ) AS MembershipCount

FROM       
   DIM.[Date] AS d

WHERE
   d.CalendarYear = 2016

Thanks in advance for any suggestions!

Emission answered 17/5, 2017 at 15:11 Comment(0)
C
5

The logic in your SQL is mostly correct, you have just implemented it poorly for how SQL likes to do things. Starting with your Dates table as you have done already, rather than doing a sub-select for each row of data, change your logic to a join and you are there:

select d.DateKey
      ,count(m.MembershipID) as MembershipCount
from DIM.[Date] as d
    left join Memberships as m
        on(d.DateKey between m.ValidFromDateKey and m.ValidToDateKey)
where d.CalendarYear = 2016
group by d.DateKey
order by d.DateKey;

What you may want to be careful of is identifying which memberships are to be counted on each day. For example, if your date is 2006-05-09 should MembershipID 0001 be included as it ends that day?

The question is essentially, are you counting the number of Memberships that were active at any point during the entire day, or just those that were active at a particular time, say the start or the end of the day?

Then repeat this thought process for your ValidFromDate values.

Carnot answered 17/5, 2017 at 15:17 Comment(7)
Fantastic! That works perfectly - When my query took >30 secs I aborted it but it now runs the full year in <1 second. Thanks for the comment regarding the dates too. I just need to know whether the membership was valid at any point of each day and the ValidTo/From dates are inclusive, so your query is spot on.Emission
Wow! If this gives you the performance you need, then it is a good way to go. There are other approaches when the non-equijoin takes too long.Valley
@Emission SQL works using sets of data and is therefore very good at joining sets of data together. Tables are just sets of data. When you put another select statement within your main select, it is run for every row that is returned, rather than just the once and joined together. For further reading, look up "Set Based Thinking"Carnot
@GordonLinoff Do you have examples of the other approaches? Would be good to have a read through.Carnot
@GordonLinoff Was doing a similar thing to this earlier and remembered your mention of other methods here. Could you provide a link to something that expands on these?Carnot
@Carnot . . . You should ask a question.Valley
@GordonLinoff DoneCarnot

© 2022 - 2024 — McMap. All rights reserved.