SQL Find max no of consecutive months over a period of last 12 Months
Asked Answered
D

2

6

I am trying to write a query in sql where I need to find the max no. of consecutive months over a period of last 12 months excluding June and July.

so for example I have an initial table as follows

+---------+--------------+-----------+------------+
|      id | Payment      |  amount   |    Date    |
+---------+--------------+-----------+------------+
|       1 | CJ1          |     70000 | 11/3/2020  |
|       1 | 1B4          |  36314000 | 12/1/2020  |
|       1 | I21          | 119439000 | 1/12/2021  |
|       1 | 0QO          |   9362100 | 2/2/2021   |
|       1 | 1G0          | 140431000 | 2/23/2021  |
|       1 | 1G           |   9362100 | 3/2/2021   |
|       1 | g5d          |   9362100 | 4/6/2021   |
|       1 | rt5s         |  13182500 | 4/13/2021  |
|       1 | fgs5         |     48598 | 5/18/2021  |
|       1 | sd8          |     42155 | 5/25/2021  |
|       1 | wqe8         |  47822355 | 7/20/2021  |
|       1 | cbg8         |   4589721 | 7/27/2021  |
|       1 | jlk8         |   4589721 | 8/3/2021   |
|       1 | cxn9         |   4589721 | 10/5/2021  |
|       1 | qwe          |  45897210 | 11/9/2021  |
|       1 | mmm          |  45897210 | 12/16/2021 |
+---------+--------------+-----------+------------+

I have written below query:

SELECT customer_number, year, month,
payment_month - lag(payment_month) OVER(partition by customer_number ORDER BY year, month) as previous_month_indicator,
FROM 
(
    SELECT DISTINCT Month(date) as month, Year(date) as year, CUSTOMER_NUMBER  
    FROM Table1
    WHERE Month(date) not in (6,7)
    and TO_DATE(date,'yyyy-MM-dd') >= DATE_SUB('2021-12-31', 425)
    and customer_number = 1
) As C

and I get this output

+-----------------+------+-------+--------------------------+
| customer_number | year | month | previous_month_indicator |
+-----------------+------+-------+--------------------------+
|               1 | 2020 |    11 | null                     |
|               1 | 2020 |    12 | 1                        |
|               1 | 2021 |     1 | -11                      |
|               1 | 2021 |     2 | 1                        |
|               1 | 2021 |     3 | 1                        |
|               1 | 2021 |     4 | 1                        |
|               1 | 2021 |     5 | 1                        |
|               1 | 2021 |     8 | 3                        |
|               1 | 2021 |    10 | 2                        |
|               1 | 2021 |    11 | 1                        |
+-----------------+------+-------+--------------------------+

What I want is to get a view like this Expected output


+-----------------+------+-------+--------------------------+
| customer_number | year | month | previous_month_indicator |
+-----------------+------+-------+--------------------------+
|               1 | 2020 |    11 |                        1 |
|               1 | 2020 |    12 |                        1 |
|               1 | 2021 |     1 |                        1 |
|               1 | 2021 |     2 |                        1 |
|               1 | 2021 |     3 |                        1 |
|               1 | 2021 |     4 |                        1 |
|               1 | 2021 |     5 |                        1 |
|               1 | 2021 |     8 |                        1 |
|               1 | 2021 |     9 |                        0 |
|               1 | 2021 |    10 |                        1 |
|               1 | 2021 |    11 |                        1 |
+-----------------+------+-------+--------------------------+

As June/July does not matter, after May, August should be considered as consecutive month, and since in September there was no record it appears as 0 and breaks the consecutive months chain.

My final desired output is to get the max no of consecutive months in which transactions were made which in above case is 8 from Nov-2020 to Aug-2021

Final Desired Output:

+-----------------+-------------------------+
| customer_number | Max_consecutive_months |
+-----------------+-------------------------+
|               1 |                       8 |
+-----------------+-------------------------+
Dealfish answered 10/1, 2022 at 1:51 Comment(3)
its not clear what are the rules for filling the previous_month_indicator. Tak a scenario where the 8th month had transactions, then what would be the expectation of the final output. Max_consecutive_monthsOloroso
@GeorgeJoseph, I thought, I made it clear that I am taking last 12 months excluding june/july so 8th month (aug) in my case will be the 6th month or occurs right after May. so, final moths list will be Nov, Dec-2020, Jan-2021, Feb, Mar, Apr, May, Aug and so on... I hope it make sense that for given data max_consecutive_months will be 8Dealfish
Why should I tag my DBMSSewellel
Y
1

CTEs can break this down a little easier. In the code below, the payment_streak CTE is the key bit; the start_of_streak field is first marking rows that count as the start of a streak, and then taking the maximum over all previous rows (to find the start of this streak).

The last SELECT is only comparing these two dates, computing how many months are between them (excluding June/July), and then finding the best streak per customer.

WITH payments_in_context AS (
  SELECT customer_number,
    date,
    lag(date) OVER (PARTITION BY customer_number ORDER BY date) AS prev_date
  FROM Table1
  WHERE EXTRACT(month FROM date) NOT IN (6,7)
),
payment_streak AS (
  SELECT 
    customer_number,
    date,
    max(
    CASE WHEN (prev_date IS NULL)
           OR (EXTRACT(month FROM date) <> 8
                 AND (date - prev_date >= 62 
                    OR MOD(12 + EXTRACT(month FROM date) - EXTRACT(month FROM prev_date),12)) > 1))
           OR (EXTRACT(month FROM date) = 8
                 AND (date - prev_date >= 123
                    OR EXTRACT(month FROM prev_date) NOT IN (5,8)))
         THEN date END
    ) OVER (PARTITION BY customer_number ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    as start_of_streak
    FROM payments_in_context
)
SELECT customer_number,
  max( 1 +
    10*(EXTRACT(year FROM date) - EXTRACT(year FROM start_of_streak))
      + (EXTRACT(month FROM date) - EXTRACT(month FROM start_of_streak))
      + CASE WHEN (EXTRACT(month FROM date) > 7 AND EXTRACT(month FROM start_of_streak) < 6)
             THEN -2
             WHEN (EXTRACT(month FROM date) < 6 AND EXTRACT(month FROM start_of_streak) > 7)
             THEN 2
             ELSE 0 END
     ) AS max_consecutive_months
FROM payment_streak
GROUP BY 1;
Yuki answered 10/1, 2022 at 3:42 Comment(5)
I tried your code, it works fine for almost all the data I have, but in one case where streak ends at May it does not return the correct no. of consecutive months. Can you help in this?Dealfish
the major streak which I want to capture for this particular case starts at 2021-12-04 and ends at 2021-05-28 the the final max_consecutive_months I get for this case is 4 which should be 6. Any suggestions how can I solve it?Dealfish
oh yes, my bad, it's 2020-12-04 and ends at 2021-05-28Dealfish
but main issue exists in last select as per this formula if there is no Aug the streak stops at May and max value at that point in 3.Dealfish
Thanks, I was missing a case in the last case statement. Full years behave like a 10-month streak. In the case where the start is before the summer gap and the end is after, you need to subtract 2. In the case where these are reversed, you need to add two.Yuki
W
1

You can use a recursive cte to generate all the dates in the twelve month timespan for each customer id, and then find the maximum number of consecutive dates excluding June and July in that interval:

with recursive cte(id, m, c) as (
   select cust_id, min(date), 1 from payments group by cust_id
   union all
   select c.id, c.m + interval 1 month, c.c+1 from cte c where c.c <= 12
),
dts(id, m, f) as (
   select c.id, c.m, c.c = 1 or exists 
       (select 1 from payments p where p.cust_id = c.id and extract(month from p.date) = extract(month from (c.m - interval 1 month)) and extract(year from p.date) = extract(year from (c.m - interval 1 month))) 
   from cte c where extract(month from c.m) not in (6,7)
),
result(id, f, c) as (
  select d.id, d.f, (select sum(d.id = d1.id and d1.m < d.m and d1.f = 0)+1 from dts d1) 
  from dts d where d.f != 0
)
select r1.id, max(r1.s)-1 from (select r.id, r.c, sum(r.f) s from result r group by r.id, r.c) r1 group by r1.id
Weighted answered 10/1, 2022 at 4:14 Comment(1)
Sorry, for not mentioning earlier but I am using spark sql and there is no recursive cte in it, do you know any work around that?Dealfish

© 2022 - 2024 — McMap. All rights reserved.