How to group continuous ranges using MySQL
Asked Answered
I

3

7

I have a table that contains categories, dates and rates. Each category can have different rates for different dates, one category can have only one rate at a given date.

Id        CatId    Date        Rate 
------  ------   ------------   ---------
000001      12   2009-07-07     1
000002      12   2009-07-08     1
000003      12   2009-07-09     1
000004      12   2009-07-10     2
000005      12   2009-07-15     1
000006      12   2009-07-16     1
000007      13   2009-07-08     1
000008      13   2009-07-09     1
000009      14   2009-07-07     2
000010      14   2009-07-08     1
000010      14   2009-07-10     1

Unique index (catid, Date, Rate) I would like for each category to group all continuous dates ranges and keep only the begin and the end of the range. For the previous example, we would have:

CatId    Begin          End            Rate 
------   ------------   ------------   ---------
12        2009-07-07    2009-07-09     1
12        2009-07-10    2009-07-10     2
12        2009-07-15    2009-07-16     1  
13        2009-07-08    2009-07-09     1  
14        2009-07-07    2009-07-07     2
14        2009-07-08    2009-07-08     1
14        2009-07-10    2009-07-10     1

I found a similar solution in the forum which did not exactly give the result

WITH    q AS
        (
        SELECT  *,
                ROW_NUMBER() OVER (PARTITION BY CatId, Rate ORDER BY [Date]) AS rnd,
                ROW_NUMBER() OVER (PARTITION BY CatId ORDER BY [Date]) AS rn
        FROM    my_table
        )
SELECT  CatId AS catidd, MIN([Date]) as beginn, MAX([Date])as endd, Rate
FROM    q
GROUP BY  CatId, rnd - rn, Rate

SEE SQL FIDDLE How can I do the same thing in mysql? Please help!

Interscholastic answered 5/9, 2012 at 6:59 Comment(2)
Why does your example show for (CatId,Rate)=(14,1) a resulting range from 2009-07-08 to 2009-07-10 when there is no 2009-07-09 in the underlying table? c.f. (CatId,Rate)=(12,1), which produces two resulting ranges due to its discontinuity.Ensanguine
Thanks eggyal, now it's correctedInterscholastic
E
6

MySQL doesn't support analytic functions, but you can emulate such behaviour with user-defined variables:

SELECT   CatID, Begin, MAX(Date) AS End, Rate
FROM (
  SELECT   my_table.*,
           @f:=CONVERT(
             IF(@c<=>CatId AND @r<=>Rate AND DATEDIFF(Date, @d)=1, @f, Date), DATE
           ) AS Begin,
           @c:=CatId, @d:=Date, @r:=Rate
  FROM     my_table JOIN (SELECT @c:=NULL) AS init
  ORDER BY CatId, Rate, Date
) AS t
GROUP BY CatID, Begin, Rate

See it on sqlfiddle.

Ensanguine answered 5/9, 2012 at 7:46 Comment(4)
@vanabel: It is MySQL's NULL-safe equal to operator.Ensanguine
@Ensanguine Nice answer, upvote. Can you please explain in more detail the trick you used to set the value of variable @f (joining with (SELECT @c:=NULL))?Harbour
@MiljenMikic: in this case, the IF() will always take the false branch on the first record (due to CatID being non-null and therefore different from the NULL-initialised value of @c). Consequently, the initial value of @f is ignored/overwritten before being read—hence initialisation is unnecessary.Ensanguine
This SQL's first running in a session returns all 11 rows, and returns 7 rows after that. This happens on both MySQL 5.6 and 5.7. The above sqlfiddle always returns 7 rows as expected. Anyone know why?Bacon
B
4
SELECT catid,min(ddate),max(ddate),rate
FROM (
    SELECT
        Catid,
        Ddate,  
        rate,
        @rn := CASE WHEN (@prev <> rate 
           or DATEDIFF(ddate, @prev_date)>1) THEN @rn+1 ELSE @rn END AS rn,
        @prev := rate,
        @prev_id := catid ,
        @prev_date :=ddate
    FROM (
        SELECT CatID,Ddate,rate 
        FROM rankdate
        ORDER BY CatID, Ddate ) AS a , 
        (SELECT @prev := -1, @rn := 0, @prev_id:=0 ,@prev_date:=-1) AS vars      

) T1 group by catid,rn

Note: The line (SELECT @prev := -1, @rn := 0, @prev_id:=0 ,@prev_date:=-1) AS vars is not necessary in Mysql Workspace, but it is in the PHP mysql_query function.

SQL FIDDLE HERE

Beefburger answered 5/9, 2012 at 7:18 Comment(4)
if we remove the record with ID = '000004' your query return (begin:2009-07-07, end:2009-07-16, rate:1) that's not correct because there is a gap, it should return (begin:2009-07-07, end:2009-07-09, rate:1) and (begin:2009-07-15, end:2009-07-16, rate:1). SQL FIDDLE HEREInterscholastic
@BoussahelBachir, i have edited the answer. in this case, need to include the condition of datediff to cater the situation you mention.Beefburger
You don't seem to be testing @prev_id anywhere... what happens if two contiguous dates with the same Rate have a different CatId?Ensanguine
Due to i have group by catid, so it will group by catid follow by rn somehow.Thus i do not use @prev_id at last. Initially , i did think of to include @prev_id in the condition thou.Beefburger
A
0

I know I am late, still posting a solution that worked for me. Had the same issue, here's how I got it

Found a good solution using variables

SELECT  MIN(id) AS id, MIN(date) AS date, MIN(state) AS state, COUNT(*) cnt
FROM    (
    SELECT  @r := @r + (@state != state OR @state IS NULL) AS gn,
            @state := state AS sn,
            s.id, s.date, s.state
    FROM    (
            SELECT  @r := 0,
                    @state := NULL
            ) vars,
            t_range s
    ORDER BY
            date, state
    ) q
GROUP BY gn

More details at : https://explainextended.com/2009/07/24/mysql-grouping-continuous-ranges/

Ascidian answered 26/5, 2016 at 12:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.