Aggregate for each day over time series, without using non-equijoin logic
Asked Answered
I

6

17

Initial Question

Given the following dataset paired with a dates table:

MembershipId | ValidFromDate | ValidToDate
==========================================
0001         | 1997-01-01    | 2006-05-09
0002         | 1997-01-01    | 2017-05-12
0003         | 2005-06-02    | 2009-02-07

How many Memberships were open on any given day or timeseries of days?

Initial Answer

Following this question being asked here, this answer provided the necessary functionality:

select d.[Date]
      ,count(m.MembershipID) as MembershipCount
from DIM.[Date] as d
    left join Memberships as m
        on(d.[Date] between m.ValidFromDateKey and m.ValidToDateKey)
where d.CalendarYear = 2016
group by d.[Date]
order by d.[Date];

though a commenter remarked that There are other approaches when the non-equijoin takes too long.

Followup

As such, what would the equijoin only logic look like to replicate the output of the query above?


Progress So Far

From the answers provided so far I have come up with the below, which outperforms on the hardware I am working with across 3.2 million Membership records:

declare @s date = '20160101';
declare @e date = getdate();

with s as
(
    select d.[Date] as d
        ,count(s.MembershipID) as s
    from dbo.Dates as d
        join dbo.Memberships as s
            on d.[Date] = s.ValidFromDateKey
    group by d.[Date]
)
,e as
(
    select d.[Date] as d
        ,count(e.MembershipID) as e
    from dbo.Dates as d
        join dbo.Memberships as e
            on d.[Date] = e.ValidToDateKey
    group by d.[Date]
),c as
(
    select isnull(s.d,e.d) as d
            ,sum(isnull(s.s,0) - isnull(e.e,0)) over (order by isnull(s.d,e.d)) as c
    from s
        full join e
            on s.d = e.d
)
select d.[Date]
    ,c.c
from dbo.Dates as d
    left join c
        on d.[Date] = c.d
where d.[Date] between @s and @e
order by d.[Date]
;

Following on from that, to split this aggregate into constituent groups per day I have the following, which is also performing well:

declare @s date = '20160101';
declare @e date = getdate();

with s as
(
    select d.[Date] as d
        ,s.MembershipGrouping as g
        ,count(s.MembershipID) as s
    from dbo.Dates as d
        join dbo.Memberships as s
            on d.[Date] = s.ValidFromDateKey
    group by d.[Date]
            ,s.MembershipGrouping
)
,e as
(
    select d.[Date] as d
        ,e..MembershipGrouping as g
        ,count(e.MembershipID) as e
    from dbo.Dates as d
        join dbo.Memberships as e
            on d.[Date] = e.ValidToDateKey
    group by d.[Date]
            ,e.MembershipGrouping
),c as
(
    select isnull(s.d,e.d) as d
            ,isnull(s.g,e.g) as g
            ,sum(isnull(s.s,0) - isnull(e.e,0)) over (partition by isnull(s.g,e.g) order by isnull(s.d,e.d)) as c
    from s
        full join e
            on s.d = e.d
                and s.g = e.g
)
select d.[Date]
    ,c.g
    ,c.c
from dbo.Dates as d
    left join c
        on d.[Date] = c.d
where d.[Date] between @s and @e
order by d.[Date]
        ,c.g
;

Can anyone improve on the above?

Imperialism answered 27/3, 2018 at 9:7 Comment(5)
The way you are doing it is not the only way. I'm playing around with a couple other ways to do this.Excaudate
@AlanBurstein Found anything of note?Imperialism
The counts returned by your query are not correct. Run the query, then run a simpler form that just counts from the membership table for a single date, such as '2016-01-01'Demona
@PittsburghDBA Can you please elaborate on which query you are referring to and not correct how? I've not seen any discrepancy in my testing though I'm more than happy to be shown any mistake I've made.Imperialism
@PittsburghDBA you are right, last day of membership is not counted with curent query. A dateadd(d,1,ValidToDateKey) is needed to avoid this problem. Check my answer.Frumentaceous
B
6

On the assumption your date dimension contains all dates contained in all membership periods you can use something like the following.

The join is an equi join so can use hash join or merge join not just nested loops (which will execute the inside sub tree once for each outer row).

Assuming index on (ValidToDate) include(ValidFromDate) or reverse this can use a single seek against Memberships and a single scan of the date dimension. The below has an elapsed time of less than a second for me to return the results for a year against a table with 3.2 million members and general active membership of 1.4 million (script)

DECLARE @StartDate DATE = '2016-01-01',
        @EndDate   DATE = '2016-12-31';

WITH MD
     AS (SELECT Date,
                SUM(Adj) AS MemberDelta
         FROM   Memberships
                CROSS APPLY (VALUES ( ValidFromDate, +1),
                                    --Membership count decremented day after the ValidToDate
                                    (DATEADD(DAY, 1, ValidToDate), -1) ) V(Date, Adj)
         WHERE
          --Members already expired before the time range of interest can be ignored
          ValidToDate >= @StartDate
          AND
          --Members whose membership starts after the time range of interest can be ignored
          ValidFromDate <= @EndDate
         GROUP  BY Date),
     MC
     AS (SELECT DD.DateKey,
                SUM(MemberDelta) OVER (ORDER BY DD.DateKey ROWS UNBOUNDED PRECEDING) AS CountOfNonIgnoredMembers
         FROM   DIM_DATE DD
                LEFT JOIN MD
                  ON MD.Date = DD.DateKey)
SELECT DateKey,
       CountOfNonIgnoredMembers AS MembershipCount
FROM   MC
WHERE  DateKey BETWEEN @StartDate AND @EndDate 
ORDER BY DateKey

Demo (uses extended period as the calendar year of 2016 isn't very interesting with the example data)

enter image description here

Banderilla answered 7/4, 2018 at 9:18 Comment(5)
I thought that something along the lines of Packing Intervals by Itzik Ben-Gan could be applied here. Where you have +1 for start of the interval and -1 for interval end. And then tally them up. Itzik's article has a nice illustration (figure 2). I didn't have a chance to look into it. You beat me to it.Truditrudie
@VladimirBaranov - yep. pretty sure I saw this idea in one of Itzik's books first. I think in the context of open calls in a call centreBanderilla
This looks very promising, thank you! I will run some tests first thing on Monday and report back.Imperialism
I like this simply for avoiding the Nested Loop and enabling HASH JOIN. Nice work.Demona
Definitely the best solution so far! I doubt we'll get anything better but in the spirit of fairness I'll leave the bounty open until later today.Imperialism
T
13

If most of your membership validity intervals are longer than few days, have a look at an answer by Martin Smith. That approach is likely to be faster.


When you take calendar table (DIM.[Date]) and left join it with Memberships, you may end up scanning the Memberships table for each date of the range. Even if there is an index on (ValidFromDate, ValidToDate), it may not be super useful.

It is easy to turn it around. Scan the Memberships table only once and for each membership find those dates that are valid using CROSS APPLY.

Sample data

DECLARE @T TABLE (MembershipId int, ValidFromDate date, ValidToDate date);

INSERT INTO @T VALUES
(1, '1997-01-01', '2006-05-09'),
(2, '1997-01-01', '2017-05-12'),
(3, '2005-06-02', '2009-02-07');

DECLARE @RangeFrom date = '2006-01-01';
DECLARE @RangeTo   date = '2006-12-31';

Query 1

SELECT
    CA.dt
    ,COUNT(*) AS MembershipCount
FROM
    @T AS Memberships
    CROSS APPLY
    (
        SELECT dbo.Calendar.dt
        FROM dbo.Calendar
        WHERE
            dbo.Calendar.dt >= Memberships.ValidFromDate
            AND dbo.Calendar.dt <= Memberships.ValidToDate
            AND dbo.Calendar.dt >= @RangeFrom
            AND dbo.Calendar.dt <= @RangeTo
    ) AS CA
GROUP BY
    CA.dt
ORDER BY
    CA.dt
OPTION(RECOMPILE);

OPTION(RECOMPILE) is not really needed, I include it in all queries when I compare execution plans to be sure that I'm getting the latest plan when I play with the queries.

When I looked at the plan of this query I saw that the seek in the Calendar.dt table was using only ValidFromDate and ValidToDate, the @RangeFrom and @RangeTo were pushed to the residue predicate. It is not ideal. The optimiser is not smart enough to calculate maximum of two dates (ValidFromDate and @RangeFrom) and use that date as a starting point of the seek.

seek 1

It is easy to help the optimiser:

Query 2

SELECT
    CA.dt
    ,COUNT(*) AS MembershipCount
FROM
    @T AS Memberships
    CROSS APPLY
    (
        SELECT dbo.Calendar.dt
        FROM dbo.Calendar
        WHERE
            dbo.Calendar.dt >= 
                CASE WHEN Memberships.ValidFromDate > @RangeFrom 
                THEN Memberships.ValidFromDate 
                ELSE @RangeFrom END
            AND dbo.Calendar.dt <= 
                CASE WHEN Memberships.ValidToDate < @RangeTo 
                THEN Memberships.ValidToDate 
                ELSE @RangeTo END
    ) AS CA
GROUP BY
    CA.dt
ORDER BY
    CA.dt
OPTION(RECOMPILE)
;

In this query the seek is optimal and doesn't read dates that may be discarded later.

seek 2

Finally, you may not need to scan the whole Memberships table. We need only those rows where the given range of dates intersects with the valid range of the membership.

Query 3

SELECT
    CA.dt
    ,COUNT(*) AS MembershipCount
FROM
    @T AS Memberships
    CROSS APPLY
    (
        SELECT dbo.Calendar.dt
        FROM dbo.Calendar
        WHERE
            dbo.Calendar.dt >= 
                CASE WHEN Memberships.ValidFromDate > @RangeFrom 
                THEN Memberships.ValidFromDate 
                ELSE @RangeFrom END
            AND dbo.Calendar.dt <= 
                CASE WHEN Memberships.ValidToDate < @RangeTo 
                THEN Memberships.ValidToDate 
                ELSE @RangeTo END
    ) AS CA
WHERE
    Memberships.ValidToDate >= @RangeFrom
    AND Memberships.ValidFromDate <= @RangeTo
GROUP BY
    CA.dt
ORDER BY
    CA.dt
OPTION(RECOMPILE)
;

Two intervals [a1;a2] and [b1;b2] intersect when

a2 >= b1 and a1 <= b2

These queries assume that Calendar table has an index on dt.

You should try and see what indexes are better for the Memberships table. For the last query, if the table is rather large, most likely two separate indexes on ValidFromDate and on ValidToDate would be better than one index on (ValidFromDate, ValidToDate).

You should try different queries and measure their performance on the real hardware with real data. Performance may depend on the data distribution, how many memberships there are, what are their valid dates, how wide or narrow is the given range, etc.

I recommend to use a great tool called SQL Sentry Plan Explorer to analyse and compare execution plans. It is free. It shows a lot of useful stats, such as execution time and number of reads for each query. The screenshots above are from this tool.

Truditrudie answered 4/4, 2018 at 8:17 Comment(8)
Whilst perhaps more efficient that the original answer, this doesn't utilise equi-join logic and actually still performs fairly poorly on my server regardless of what I do with the indexing. I have updated the question with a solution I have come to that is performing better, though am keen to see if you have any further insights or suggestions?Imperialism
@iamdave, 1. I haven't looked in detail in your long query yet (I'll try over the weekend), but make sure to explicitly write SUM(...) OVER (... ORDER BY ... BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). If ROW is not specified, it is assumed to be RANGE and it is not as efficient as ROW. (all this applies if the query logic is still correct with ROW, of course).Truditrudie
@iamdave, 2. Now you shared some stats about your data (3.6M rows in Memberships). Good. What is the usual duration of the membership? Is it few days, a year, a decade? How many memberships usually intersect with the given date range? (most of them, very few, most are before the given range, most are after). Knowing details about the data distribution may help to build a more efficient query.Truditrudie
@iamdave, 3. What is the data type of the fields ValidFromDateKey, ValidToDateKey? Is it date, datetime2, something else? What is the data type of dbo.Dates.[date] and dbo.Dates.DateKey? Why there are two fields there? Does your table dbo.Dates have one row per date? What is the range of dates in this table? 4. Are there a lot of memberships that start or end on the same date? (like 1st of Jan) OR these start/end dates are spread evenly across the year.Truditrudie
that ROWS/RANGE performance difference depends on version. It is fixed in recent versions (mentioned in this session sqlbits.com/Sessions/Event17/Window_Functions)Banderilla
This is discussed right at the end of the session (last couple of minutes). The fix needs the batch mode window aggregate so still this is true for row mode.Banderilla
@MartinSmith, thanks for the link, I've just watched it.Truditrudie
@VladimirBaranov The time periods are all over the place; some long, some short, sometimes more on one date and others less so, with no real start/end pattern or bias. The dates are in Date columns and the difference in column names is a relic of code obfuscation. Will update now.Imperialism
B
6

On the assumption your date dimension contains all dates contained in all membership periods you can use something like the following.

The join is an equi join so can use hash join or merge join not just nested loops (which will execute the inside sub tree once for each outer row).

Assuming index on (ValidToDate) include(ValidFromDate) or reverse this can use a single seek against Memberships and a single scan of the date dimension. The below has an elapsed time of less than a second for me to return the results for a year against a table with 3.2 million members and general active membership of 1.4 million (script)

DECLARE @StartDate DATE = '2016-01-01',
        @EndDate   DATE = '2016-12-31';

WITH MD
     AS (SELECT Date,
                SUM(Adj) AS MemberDelta
         FROM   Memberships
                CROSS APPLY (VALUES ( ValidFromDate, +1),
                                    --Membership count decremented day after the ValidToDate
                                    (DATEADD(DAY, 1, ValidToDate), -1) ) V(Date, Adj)
         WHERE
          --Members already expired before the time range of interest can be ignored
          ValidToDate >= @StartDate
          AND
          --Members whose membership starts after the time range of interest can be ignored
          ValidFromDate <= @EndDate
         GROUP  BY Date),
     MC
     AS (SELECT DD.DateKey,
                SUM(MemberDelta) OVER (ORDER BY DD.DateKey ROWS UNBOUNDED PRECEDING) AS CountOfNonIgnoredMembers
         FROM   DIM_DATE DD
                LEFT JOIN MD
                  ON MD.Date = DD.DateKey)
SELECT DateKey,
       CountOfNonIgnoredMembers AS MembershipCount
FROM   MC
WHERE  DateKey BETWEEN @StartDate AND @EndDate 
ORDER BY DateKey

Demo (uses extended period as the calendar year of 2016 isn't very interesting with the example data)

enter image description here

Banderilla answered 7/4, 2018 at 9:18 Comment(5)
I thought that something along the lines of Packing Intervals by Itzik Ben-Gan could be applied here. Where you have +1 for start of the interval and -1 for interval end. And then tally them up. Itzik's article has a nice illustration (figure 2). I didn't have a chance to look into it. You beat me to it.Truditrudie
@VladimirBaranov - yep. pretty sure I saw this idea in one of Itzik's books first. I think in the context of open calls in a call centreBanderilla
This looks very promising, thank you! I will run some tests first thing on Monday and report back.Imperialism
I like this simply for avoiding the Nested Loop and enabling HASH JOIN. Nice work.Demona
Definitely the best solution so far! I doubt we'll get anything better but in the spirit of fairness I'll leave the bounty open until later today.Imperialism
D
2

One approach is to first use an INNER JOIN to find the set of matches and COUNT() to project MemberCount GROUPed BY DateKey, then UNION ALL with the same set of dates, with a 0 on that projection for the count of members for each date. The last step is to SUM() the MemberCount of this union, and GROUP BY DateKey. As requested, this avoids LEFT JOIN and NOT EXISTS. As another member pointed out, this is not an equi-join, because we need to use a range, but I think it does what you intend.

This will serve up 1 year's worth of data with around 100k logical reads. On an ordinary laptop with a spinning disk, from cold cache, it serves 1 month in under a second (with correct counts).

Here is an example that creates 3.3 million rows of random duration. The query at the bottom returns one month's worth of data.

--Stay quiet for a moment
SET NOCOUNT ON
SET STATISTICS IO OFF
SET STATISTICS TIME OFF

--Clean up if re-running
DROP TABLE IF EXISTS DIM_DATE
DROP TABLE IF EXISTS FACT_MEMBER

--Date dimension
CREATE TABLE DIM_DATE
  (
  DateKey DATE NOT NULL 
  )

--Membership fact
CREATE TABLE FACT_MEMBER
  (
  MembershipId INT NOT NULL
  , ValidFromDateKey DATE NOT NULL
  , ValidToDateKey DATE NOT NULL
  )

--Populate Date dimension from 2001 through end of 2018
DECLARE @startDate DATE = '2001-01-01'
DECLARE @endDate DATE = '2018-12-31'
;WITH CTE_DATE AS
(
SELECT @startDate AS DateKey
UNION ALL
SELECT
       DATEADD(DAY, 1, DateKey)
FROM
       CTE_DATE AS D
WHERE
       D.DateKey < @endDate
)
INSERT INTO
  DIM_DATE
  (
  DateKey
  )
SELECT
  D.DateKey
FROM
  CTE_DATE AS D
OPTION (MAXRECURSION 32767)

--Populate Membership fact with members having a random membership length from 1 to 36 months 
;WITH CTE_DATE AS
(
SELECT @startDate AS DateKey
UNION ALL
SELECT
       DATEADD(DAY, 1, DateKey)
FROM
       CTE_DATE AS D
WHERE
       D.DateKey < @endDate
)
,CTE_MEMBER AS
(
SELECT 1 AS MembershipId
UNION ALL
SELECT MembershipId + 1 FROM CTE_MEMBER WHERE MembershipId < 500
)
,
CTE_MEMBERSHIP
AS
(
SELECT
  ROW_NUMBER() OVER (ORDER BY NEWID()) AS MembershipId
  , D.DateKey AS ValidFromDateKey
FROM
  CTE_DATE AS D
  CROSS JOIN CTE_MEMBER AS M
)
INSERT INTO
    FACT_MEMBER
    (
    MembershipId
    , ValidFromDateKey
    , ValidToDateKey
    )
SELECT
    M.MembershipId
    , M.ValidFromDateKey
      , DATEADD(MONTH, FLOOR(RAND(CHECKSUM(NEWID())) * (36-1)+1), M.ValidFromDateKey) AS ValidToDateKey
FROM
    CTE_MEMBERSHIP AS M
OPTION (MAXRECURSION 32767)

--Add clustered Primary Key to Date dimension
ALTER TABLE DIM_DATE ADD CONSTRAINT PK_DATE PRIMARY KEY CLUSTERED
    (
    DateKey ASC
    )

--Index
--(Optimize in your spare time)
DROP INDEX IF EXISTS SK_FACT_MEMBER ON FACT_MEMBER
CREATE CLUSTERED INDEX SK_FACT_MEMBER ON FACT_MEMBER
    (
    ValidFromDateKey ASC
    , ValidToDateKey ASC
    , MembershipId ASC
    )


RETURN

--Start test
--Emit stats
SET STATISTICS IO ON
SET STATISTICS TIME ON

--Establish range of dates
DECLARE
  @rangeStartDate DATE = '2010-01-01'
  , @rangeEndDate DATE = '2010-01-31'

--UNION the count of members for a specific date range with the "zero" set for the same range, and SUM() the counts
;WITH CTE_MEMBER
AS
(
SELECT
    D.DateKey
    , COUNT(*) AS MembershipCount
FROM
    DIM_DATE AS D
    INNER JOIN FACT_MEMBER AS M ON
        M.ValidFromDateKey <= @rangeEndDate
        AND M.ValidToDateKey >= @rangeStartDate
        AND D.DateKey BETWEEN M.ValidFromDateKey AND M.ValidToDateKey
WHERE
    D.DateKey BETWEEN @rangeStartDate AND @rangeEndDate
GROUP BY
    D.DateKey

UNION ALL

SELECT
    D.DateKey
    , 0 AS MembershipCount
FROM
    DIM_DATE AS D
WHERE
    D.DateKey BETWEEN @rangeStartDate AND @rangeEndDate
)
SELECT
    M.DateKey
    , SUM(M.MembershipCount) AS MembershipCount
FROM
    CTE_MEMBER AS M
GROUP BY
    M.DateKey
ORDER BY
    M.DateKey ASC
OPTION (RECOMPILE, MAXDOP 1)
Demona answered 6/4, 2018 at 12:32 Comment(3)
Curious as to whether you tested this on more than 3 memberships? On my 3.2m row table this performs over 10m logical reads and takes nearly a minute to run for the one month, compared to the answer in the Progress So Far section of my question which performs less than 35k logical reads and completes in 3 seconds for the whole year.Imperialism
Gonna assume you have a much better server than I have over here, as this updated version is running for a lot longer than I care to wait for...Imperialism
Here's my latest version. Let me know.Demona
L
1

Here's how I'd solve this problem with equijoin:

--data generation
declare @Membership table (MembershipId varchar(10), ValidFromDate date, ValidToDate date)
insert into @Membership values
('0001', '1997-01-01', '2006-05-09'),
('0002', '1997-01-01', '2017-05-12'),
('0003', '2005-06-02', '2009-02-07')

declare @startDate date, @endDate date
select @startDate =  MIN(ValidFromDate), @endDate = max(ValidToDate) from @Membership
--in order to use equijoin I need all days between min date and max date from Membership table (both columns)
;with cte as (
    select @startDate [date]
    union all
    select DATEADD(day, 1, [date]) from cte
    where [date] < @endDate
)
--in this query, we will assign value to each day:
--one, if project started on that day
--minus one, if project ended on that day
--then, it's enough to (cumulative) sum all this values to get how many projects were ongoing on particular day
select [date],
       sum(case when [DATE] = ValidFromDate then 1 else 0 end +
            case when [DATE] = ValidToDate then -1 else 0 end)
            over (order by [date] rows between unbounded preceding and current row)
from cte [c]
left join @Membership [m]
on [c].[date] = [m].ValidFromDate  or [c].[date] = [m].ValidToDate
option (maxrecursion 0)

Here's another solution:

--data generation
declare @Membership table (MembershipId varchar(10), ValidFromDate date, ValidToDate date)
insert into @Membership values
('0001', '1997-01-01', '2006-05-09'),
('0002', '1997-01-01', '2017-05-12'),
('0003', '2005-06-02', '2009-02-07')

;with cte as (
    select CAST('2016-01-01' as date) [date]
    union all
    select DATEADD(day, 1, [date]) from cte
    where [date] < '2016-12-31'
)

select [date],
       (select COUNT(*) from @Membership where ValidFromDate < [date]) - 
       (select COUNT(*) from @Membership where ValidToDate < [date]) [ongoing]
from cte
option (maxrecursion 0)
Lyre answered 4/4, 2018 at 7:23 Comment(0)
F
1

Pay attention, I think @PittsburghDBA is right when it says that current query return wrong result.
The last day of membership is not counted and so final sum is lower than it should be.
I have corrected it in this version.

This should improve a bit your actual progress:

declare @s date = '20160101';
declare @e date = getdate();

with 
x as (
    select d, sum(c) c
    from (
        select ValidFromDateKey d, count(MembershipID) c
        from Memberships
        group by ValidFromDateKey 

        union all

        -- dateadd needed to count last day of membership too!!
        select dateadd(dd, 1, ValidToDateKey) d, -count(MembershipID) c
        from Memberships
        group by ValidToDateKey 
    )x
    group by d
),
c as
(
    select d, sum(x.c) over (order by d) as c
    from x
)
select d.day, c cnt
from calendar d
left join c on d.day = c.d
where d.day between @s and @e
order by d.day;
Frumentaceous answered 9/4, 2018 at 11:16 Comment(0)
A
-1

First of all, your query yields '1' as MembershipCount even if no active membership exists for the given date.

You should return SUM(CASE WHEN m.MembershipID IS NOT NULL THEN 1 ELSE 0 END) AS MembershipCount.

For optimal performance create an index on Memberships(ValidFromDateKey, ValidToDateKey, MembershipId) and another on DIM.[Date](CalendarYear, DateKey).

With that done, the optimal query shall be:

DECLARE @CalendarYear INT = 2000

SELECT dim.DateKey, SUM(CASE WHEN con.MembershipID IS NOT NULL THEN 1 ELSE 0 END) AS MembershipCount
FROM
    DIM.[Date] dim
        LEFT OUTER JOIN (
            SELECT ValidFromDateKey, ValidToDateKey, MembershipID
            FROM Memberships
            WHERE
                    ValidFromDateKey <= CONVERT(DATETIME, CONVERT(VARCHAR, @CalendarYear) + '1231')
                AND ValidToDateKey   >= CONVERT(DATETIME, CONVERT(VARCHAR, @CalendarYear) + '0101')
        ) con
        ON dim.DateKey BETWEEN con.ValidFromDateKey AND con.ValidToDateKey
WHERE dim.CalendarYear = @CalendarYear
GROUP BY dim.DateKey
ORDER BY dim.DateKey

Now, for your last question, what would be the equijoin equivalent query.

There is NO WAY you can rewrite this as a non-equijoin!

Equijoin doesn't imply using join sintax. Equijoin implies using an equals predicate, whatever the sintax.

Your query yields a range comparison, hence equals doesn't apply: a between or similar is required.

Aerotherapeutics answered 6/4, 2018 at 9:20 Comment(3)
your query yields '1' as MembershipCount even if no active membership exists for the given date. This is incorrect. null values are ignored in aggregate operations so the count of 1 null values is actually 0.Imperialism
There is NO WAY you can rewrite this as a non-equijoin! Whilst this is technically true in that there must be between logic somewhere, the question refers to using only equi-join logic in combining the list of dates to the table being aggregated, which as you can see from the other answers already on this question is possible.Imperialism
Basically, it sounds like the OP wants to do this without using LEFT JOIN or NOT EXISTS. See my answer.Demona

© 2022 - 2024 — McMap. All rights reserved.