SQL: Group By on Consecutive Records
Asked Answered
V

5

13

A slightly tricky SQL question (we are running SQL server 2000).

I have the following table, StoreCount -

WeekEndDate    StoreCount
2010-07-25     359
2010-07-18     359
2010-07-11     358
2010-07-04     358
2010-06-27     358
2010-06-20     358
2010-06-13     358
2010-06-06     359
2010-05-30     360
2010-05-23     360
2010-05-16     360

I want to turn this into the following output -

StartDate    EndDate       StoreCount
2010-07-18   2010-07-25    359
2010-06-13   2010-07-11    358
2010-06-06   2010-06-06    359
2010-05-16   2010-05-30    360

As you can see, I'm wanting to group the store counts, by only as they run in sequence together.

Viola answered 4/8, 2010 at 23:34 Comment(18)
For #359, how do you know that the date in June is stand alone, that it doesn't connect with the next date in July (18th)?Manure
Because the records are in descending date orderViola
@CraigS: Not good enough - without a rule, we can't isolate the record.Manure
I don't understand your comment OMG. The store count records are in descending date order. Records need to be grouped - but only if they are consecutive. What sort of a "rule" are you after?Viola
If processed in order, there no information to tell us why to separate the record for June 6th - you have #358 where the dates cross the month boundary, but #359's record for June 6th stands alone. By your logic, I can guarantee that June 6th will have an enddate value of July 18th...Manure
OMG - the store count went up and down. There were 359 stores on the 6th June, then they closed 1 and there were 358 on the 13th June. Then they opened another one, and there were 359 again on the 18th July.Viola
@CraigS: "as they run in sequence together." <-- perhaps you can add that sequence field in the table, or show that sequence ID in your question. Database don't have built-in ordering; especially if it is MVCC type (though SQL 2000 don't have MVCC yet. but don't rely on its physical ordering of rows, you will have problem when you upgrade to 2005 or higher), the physical ordering of rows gets re-shuffled frequently if the rows are updatedCultrate
it would have been better to have a normalized databaseSmilacaceous
Hi SoftwareGeek - this is a real life example. Often times in software dev, you don't have the option of redesigning the database etc, you just have to work with what's there.Viola
@CraigS - i have seen many such real world cases over the past decade, again it's due to poor design. Hope it works out for you.Smilacaceous
@Smilacaceous - out of interest, what table design would you have recommended?Viola
@CraigS - for a start, how about keeping the startdate & enddate as 2 separate columns?Smilacaceous
@CraigS - this could possibly be resolved using PIVOT, any takers?Smilacaceous
@Smilacaceous - what would a start and end date give you in the original table? There is one date (a week end date, as it happens), and a store count associated with it. You could calculate the start date using dateadd -6 days, if you wanted, but I don't see what that would give you.Viola
@CraigS - ok, how about this? Select min(weekenddate), max(weekenddate), storecount from StoreCount group by storecount ?Smilacaceous
Hi Software Geek - that will not give me the result I'm after, as I am interested in grouping by consecutive weeks only. Check out Patrick's solution below for a query that gives the correct result.Viola
@OMGPonies Seems it was good enough for Patrick below ;)Anuran
@JakaJančar: This question is almost 2 years old. What value does your comment add?Manure
P
12

Here's a kick at the can, only it may have syntax not available in SS2k. It was actually written on Oracle as I don't have that version of SS around anymore. The only catch might be the the select of a select...(it's been a while since I've used SS2k, so it's hard to remember what features weren't available back then.)

select min(weekenddate) as start_date, end_date, storecount
from (
select s1.weekenddate
     , (select max(weekenddate)
          from store_stats s2
         where s2.storecount = s1.storecount
           and not exists (select null
                             from store_stats s3
                            where s3.weekenddate < s2.weekenddate
                              and s3.weekenddate > s1.weekenddate
                              and s3.storecount <> s1.storecount)
       ) as end_date
     , s1.storecount
from store_stats s1
) result
group by end_date, storecount
order by 1 desc


START_DATE END_DATE   STORECOUNT
---------- ---------- ----------
2010-07-18 2010-07-25        359
2010-06-13 2010-07-11        358
2010-06-06 2010-06-06        359
2010-05-16 2010-05-30        360
Porbeagle answered 5/8, 2010 at 0:57 Comment(6)
Patrick - I think this is good except you need to add an alias for the results of the FROM query. Line above group by end_date, storecount, change from ) to ) Result. One qualification for me; I don't have SQL Server 2000 either.Bennington
Brilliant Patrick - as bobs pointed out, all I needed to add was the alias, and it worked perfectly. Well done.Viola
Gets me every time. Oracle doesn't care about those aliases; SQL Server does. Thanks for the fix.Porbeagle
Patrick - I know this is quite old now, but you say your solution is designed for SQL 2000... would you do it a different way using the latest versions?Chrissy
@StephenHolt - sorry I don't check this site for messages that often. Wow, nothing like looking back at something I did 5 years ago. ;) I did put together a lag/lead version of this query (in Oracle 12c, I don't have SQL Server handy) to see how that would compare and on a 20,000 row sample; it ran 4x faster than the original (14s vs 57s). Should I edit my original or post as a new one?Porbeagle
great solution ... I was in search of something like this. thanks :)Gati
I
1

Use cursor. I don't know how to do it in sql2k by using query.

DECLARE @w datetime
DECLARE @s int
DECLARE @prev_s int
DECLARE @start_w datetime
DECLARE @end_w datetime

CREATE TABLE #zz(start datetime, [end] datetime, StoreCount int)

DECLARE a_cursor CURSOR
FOR SELECT WeekEndDate, StoreCount FROM Line ORDER BY WeekEndDate DESC, StoreCount
OPEN a_cursor
FETCH NEXT FROM a_cursor INTO @w, @s
WHILE @@FETCH_STATUS = 0
BEGIN

    IF @end_w IS NULL 
    BEGIN

        SET @end_w = @w
        SET @start_w = @w     
        SET @prev_s = @s
    END 
    ELSE IF @prev_s <> @s
    BEGIN
       INSERT INTO #zz values(@start_w,  @end_w, @prev_s)

       SET @end_w = @w  
       SET @start_w = @w  
       SET @prev_s = @s 
    END ELSE
        SET @start_w = @w 

    FETCH NEXT FROM a_cursor INTO @w, @s
END
-- add last one
INSERT INTO #zz values(@start_w, @end_w, @prev_s)

CLOSE a_cursor
DEALLOCATE a_cursor

SELECT * FROM #zz ORDER BY 1 DESC
DROP TABLE #zz
Inness answered 5/8, 2010 at 0:41 Comment(1)
See my comments on the question - #359 won't have correct values.Manure
K
1

I'm not sure how to explain this, but it seems to give the desired result for the small dataset given. In essence, it detects the points in the series where the values change.

I haven't looked at the query plan, might be painful.

Tried on a Sybase server, so syntax should be compatible with SQL Server 2K.

SELECT  x.StartDate
        , MIN( y.EndDate ) AS EndDate
        , x.StoreCount
FROM
( SELECT
        wed1.WeekEndDate AS StartDate
        , wed1.StoreCount
FROM
        wed wed1
LEFT JOIN
        wed wed2
ON      wed1.WeekEndDate = DATEADD( DAY, 7, wed2.WeekEndDate )
WHERE
        wed1.StoreCount != ISNULL( wed2.StoreCount, wed1.StoreCount - 1 )
) x,
( SELECT
        wed1.WeekEndDate AS EndDate
FROM
        wed wed1
LEFT JOIN
        wed wed2
ON      wed1.WeekEndDate = DATEADD( DAY, -7, wed2.WeekEndDate )
WHERE
        wed1.StoreCount != ISNULL( wed2.StoreCount, wed1.StoreCount - 1 )
) y
WHERE
        y.EndDate >= x.StartDate
GROUP BY
        x.StartDate
HAVING
        x.StartDate = MIN( x.StartDate )
ORDER BY
        1 DESC

StartDate    EndDate      StoreCount
------------ ------------ -----------
 Jul 18 2010  Jul 25 2010         359
 Jun 13 2010  Jul 11 2010         358
 Jun  6 2010  Jun  6 2010         359
 May 16 2010  May 30 2010         360
Kubiak answered 5/8, 2010 at 0:55 Comment(0)
D
1

Ok, here's my go at it.

DECLARE @curDate DATETIME = (SELECT MIN(WeekEndDate) FROM Table_1);
    DECLARE @curCount INT = (SELECT StoreCount FROM Table_1 WHERE WeekEndDate = @curDate);
    DECLARE @sDate DATETIME = GETDATE()
    DECLARE @eDate DATETIME = 0


    WHILE @eDate < (SELECT MAX(WeekEndDate) FROM Table_1)
    BEGIN
        SELECT @sDate = (SELECT WeekEndDate AS StartDate FROM Table_1 WHERE WeekEndDate = @curDate) -- SELECT START DATE

        -- NOW GET THE END DATE IN THIS GROUP
        DECLARE @d1 DATETIME = @curDate
        DECLARE @d2 DATETIME = @curDate
        DECLARE @loop INT = 1
        WHILE @loop = 1
            BEGIN
                IF ((SELECT StoreCount FROM Table_1 WHERE WeekEndDate = @d1) <> @curCount OR @d1 = (SELECT MAX(WeekEndDate) FROM Table_1)) BEGIN
                    SELECT @eDate = (SELECT TOP(1) WeekEndDate FROM Table_1 WHERE StoreCount = @curCount AND WeekEndDate = @d2 ORDER BY WeekEndDate DESC)
                    SELECT @loop = 0 END
                ELSE BEGIN
                    SELECT @d2 = @d1 
                    SELECT @d1 = (SELECT TOP(1) WeekEndDate FROM Table_1 WHERE WeekEndDate > @d1 ORDER BY WeekEndDate) END
            END


        SELECT @sDate AS StartDate, @eDate AS EndDate, @curCount AS StoreCount   -- DO QHATEVER YOU NEED TO DO WITH THE RECORDS HERE

        SELECT TOP(1) @curDate = WeekEndDate, @curCount = StoreCount
        FROM Table_1
        WHERE WeekEndDate > @eDate
        GROUP BY WeekEndDate, StoreCount
        ORDER BY WeekEndDate ASC

    END
Dodecahedron answered 5/8, 2010 at 1:0 Comment(0)
M
-1

try this simple solution:

create table x  (weekEndDate char(10), storeCount int);
insert into x values
('2010-07-25',359),
('2010-07-18',359),
('2010-07-11',358),
('2010-07-04',358),
('2010-06-27',358),
('2010-06-20',358),
('2010-06-13',358),
('2010-06-06',359),
('2010-05-30',360),
('2010-05-23',360),
('2010-05-16',360);
select min(weekenddate) as startdate, max(weekenddate) as enddate, min(storecount) as storecount 
from 
(select weekenddate, storecount, concat(row_number() over (order by weekenddate) -row_number() over (partition by storecount order by weekenddate),'|',storecount) as groupkey from x) w
group by groupkey order by startdate desc;

sqlfiddle

Metalline answered 10/12, 2015 at 1:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.