Postgres Consecutive Days, gaps and islands, Tabibitosan
Asked Answered
R

2

4

SQL FIDDLE

I have the following database table:

date name
2014-08-10 bob
2014-08-10 sue
2014-08-11 bob
2014-08-11 mike
2014-08-12 bob
2014-08-12 mike
2014-08-05 bob
2014-08-06 bob
SELECT t.Name,COUNT(*) as frequency
FROM (
    SELECT Name,Date,
            row_number() OVER (
            ORDER BY Date
            ) - row_number() OVER (
            PARTITION BY Name ORDER BY Date
            ) + 1 seq
    FROM orders
    ) t
GROUP BY Name,seq;

Tried running the Tabibitosan method of finding gaps and islands produces the below table which is incorrect. The name "mike" should actually have a count of 2 since the 11th and 12th days are consecutive. How do I fix this?

name frequency
mike 1
bob 3
bob 2
mike 1
sue 1

Correct expected output below:

name frequency
bob 3
bob 2
mike 2
sue 1
Roundish answered 14/5, 2021 at 4:44 Comment(4)
Can you also add the exact expected result? – Gladdie
@JimJones just updated my question with what the expected output should be – Roundish
BTW, It's Tabibitosan, and not Tababitosan πŸ˜‰ – Ilailaire
This will not work because date is not unique. So being ordered by date rows 2014-08-11 bob and 2014-08-11 mike will get different row_number() indroducing a shift for next entries of bob and mike, but the same when partitioned by name, which ruins the methods idea. – Qualifier
S
4

You are using the wrong logic. Basically, you want dates that are sequential, so you want to subtract the sequence from the date:

SELECT t.Name, COUNT(*) as frequency
FROM (SELECT o.*,
             row_number() OVER (PARTITION BY Name ORDER BY Date) as seqnum
      FROM orders o
     ) t
GROUP BY Name, date - seqnum * interval '1 day';

Here is a db<>fiddle.

Sunroom answered 14/5, 2021 at 11:33 Comment(0)
B
1

Gaps and Islands problem for Date datatype Demo1 in Postgresql:

Run this:

drop table if exists foobar; 
CREATE TABLE foobar( tick text, date_val date ); 
insert into foobar values('XYZ', '2021-01-03');  --island 1 has width 2
insert into foobar values('XYZ', '2021-01-04');  --island 1
insert into foobar values('XYZ', '2021-05-09');  --island 2 has width 3
insert into foobar values('XYZ', '2021-05-10');  --island 2 
insert into foobar values('XYZ', '2021-05-11');  --island 2
insert into foobar values('XYZ', '2021-07-07');  --island 3 has width 4
insert into foobar values('XYZ', '2021-07-08');  --island 3
insert into foobar values('XYZ', '2021-07-09');  --island 3
insert into foobar values('XYZ', '2021-07-10');  --island 3 
insert into foobar values('XYZ', '2022-10-10');  --island 4 has width 1

select * from foobar; 

select tick, island_width, min_val, max_val, 
       min_val - lag(max_val) over (order by max_val) 
       as gap_width from  
( 
  select tick, count(*) as island_width, 
         min(date_val) min_val, max(date_val) max_val 
  from ( 
    select t.*, 
    row_number() over ( partition by tick order by date_val ) as seqnum 
    from foobar t where tick = 'XYZ' 
    ) t 
  group by tick, date_val - seqnum * interval '1 day' 
) t2 order by max_val desc

Which Prints gaps and islands of the date col thustly:

β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” 
β”‚ tick β”‚  date_val  β”‚ 
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  
β”‚ XYZ  β”‚ 2021-01-03 β”‚ 
β”‚ XYZ  β”‚ 2021-01-04 β”‚ 
β”‚ XYZ  β”‚ 2021-05-09 β”‚ 
β”‚ XYZ  β”‚ 2021-05-10 β”‚ 
β”‚ XYZ  β”‚ 2021-05-11 β”‚ 
β”‚ XYZ  β”‚ 2021-07-07 β”‚  
β”‚ XYZ  β”‚ 2021-07-08 β”‚  
β”‚ XYZ  β”‚ 2021-07-09 β”‚  
β”‚ XYZ  β”‚ 2021-07-10 β”‚  
β”‚ XYZ  β”‚ 2022-10-10 β”‚  
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” 
β”‚ tick β”‚ island_width β”‚  min_val   β”‚  max_val   β”‚ gap_width β”‚ 
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ 
β”‚ XYZ  β”‚            1 β”‚ 2022-10-10 β”‚ 2022-10-10 β”‚       457 β”‚ 
β”‚ XYZ  β”‚            4 β”‚ 2021-07-07 β”‚ 2021-07-10 β”‚        57 β”‚ 
β”‚ XYZ  β”‚            3 β”‚ 2021-05-09 β”‚ 2021-05-11 β”‚       125 β”‚ 
β”‚ XYZ  β”‚            2 β”‚ 2021-01-03 β”‚ 2021-01-04 β”‚         Β€ β”‚ 
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 

The column island_width gives the width of continuous data. The gap_width gives you the width of missing data.

Gaps and Islands problem for Integer datatype Demo2 in Postgresql:

Run This:

drop table if exists foobar; 
CREATE TABLE foobar( tick text, the_value int); 
insert into foobar values('XYZ', -5);  --island 1 has width 2 
insert into foobar values('XYZ', -4);  --end island 1 
insert into foobar values('XYZ', 0);   --island 2 has width 3 
insert into foobar values('XYZ', 1);   --island 2 
insert into foobar values('XYZ', 2);   --end island 2 
insert into foobar values('XYZ', 34);  --island 3 has width 4 
insert into foobar values('XYZ', 35);  --island 3 
insert into foobar values('XYZ', 36);  --island 3 
insert into foobar values('XYZ', 37);  --island 3 
insert into foobar values('XYZ', 85);  --island 4 has width 1 
  
select * from foobar; 
  
select tick, island_width, min_val, max_val, 
       min_val - lag(max_val) over (order by max_val) 
       as gap_width from 
( 
  select tick, count(*) as island_width, 
         min(the_value) min_val, max(the_value) max_val 
  from ( 
    select t.*, 
    row_number() over ( partition by tick order by the_value) as seqnum 
    from foobar t where tick = 'XYZ' 
    ) t 
  group by tick, the_value - seqnum * 1 
) t2 order by max_val desc

Which Prints gaps and islands of the integer col thustly:

β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” 
β”‚ tick β”‚ the_value β”‚ 
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ 
β”‚ XYZ  β”‚        -5 β”‚ 
β”‚ XYZ  β”‚        -4 β”‚ 
β”‚ XYZ  β”‚         0 β”‚ 
β”‚ XYZ  β”‚         1 β”‚ 
β”‚ XYZ  β”‚         2 β”‚ 
β”‚ XYZ  β”‚        34 β”‚ 
β”‚ XYZ  β”‚        35 β”‚ 
β”‚ XYZ  β”‚        36 β”‚ 
β”‚ XYZ  β”‚        37 β”‚ 
β”‚ XYZ  β”‚        85 β”‚ 
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” 
β”‚ tick β”‚ island_width β”‚ min_val β”‚ max_val β”‚ gap_width β”‚ 
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ 
β”‚ XYZ  β”‚            1 β”‚      85 β”‚      85 β”‚        48 β”‚ 
β”‚ XYZ  β”‚            4 β”‚      34 β”‚      37 β”‚        32 β”‚ 
β”‚ XYZ  β”‚            3 β”‚       0 β”‚       2 β”‚         4 β”‚ 
β”‚ XYZ  β”‚            2 β”‚      -5 β”‚      -4 β”‚         Β€ β”‚ 
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The column island_width gives the width of continuous data. The gap_width gives you the width of missing data. This allows you to say: "show me the gaps between islands" or "show me the island sizes and start/stop location"

Beak answered 4/4, 2022 at 14:58 Comment(0)

© 2022 - 2024 β€” McMap. All rights reserved.