Perl: makes array of item IDs shortened

H

3

5

I have a large set of numbers, basically 1001 .. 150000 for a database using MySQL

There are a ton of gaps in the IDs on the database, so not all IDs exist. It can go from 100000 - 10500, then the next number will be 10675, and so forth.

I want to make the IDs shortened, such as 1001..3000, x, x, x, 55000..101000, etc.

I'm sure it's simple.

SELECT id FROM table_name WHERE data = x

give me above info.

I used

select group_concat(id) from items where id>1000

to get all ids in a comma-separated list. How do I shrink this to be more clean? Basically to add ".." to a series of sequential numbers.

I am using Perl, but I'm just not sure of the syntax to make it work.

Header answered 6/8, 2024 at 19:6 Comment(4)

Don't do this in SQL, do it in a real programming language. I don't think there's any easy way in SQL to find the gaps and merge all the consecutive numbers. – Miry 6/8, 2024 at 19:19

Provide a sample, CREATE TABLE + INSERT INTO with short data (5-7 rows, 2-3 consecutive values groups, including 1-value group) and show desired output for this data. – Racing 6/8, 2024 at 19:22

More specifically, use a language that has arrays. – Miry 6/8, 2024 at 19:22

If you had 1, 2, 3, 5, 7, 8, 9, 10 you want to output 1..3, 5, 7..10? – Spillage 6/8, 2024 at 19:35

S

1

here is a short query for you

SELECT GROUP_CONCAT(
  CONCAT( IF(l=h, l, CONCAT(l,"..",h)))
) GAP
FROM ( 
  SELECT MIN(t1.id) l,MAX(t1.id) h, MAX(IF(t2.id IS NULL,@grp:=@grp+1,@grp)) AS grp
  FROM id_table t1
  LEFT JOIN id_table t2 ON t2.id = t1.id +1
  CROSS JOIN ( SELECT @grp := 0 ) AS INIT
  GROUP BY @grp
) as r;

result

100..103,110,120..121,200..203,400,500

sample

dbfiddle

Southernly answered 6/8, 2024 at 21:24 Comment(0)

S

7

This is a variation on the "gaps and islands" problem.

First, match up each ID with its previous ID.

select 
  id, 
  lag(id) over(order by id) as prev_id 
from test

If we have 1, 2, 3, 5, 7, 8, 9, 10, 12 this produces...

id	prev_id
1	null
2	1
3	2
5	3
7	5
8	7
9	8
10	9
12	10

Now we can find the start of each group by finding the rows where id - prev_id <> 1, or if prev_id is null. That's 1, 5, 7, and 12 above. The end is the prev_id of the next matching row. If there is no next row, the end is the highest ID.

with lag_ids as (
  select 
    id, 
    lag(id) over(order by id) as prev_id
  from test
)
select 
  id as start, 
  coalesce(
    lead(prev_id) over(order by id),
    (select max(id) from test)
  ) as end
from lag_ids 
where prev_id is null 
   or id - prev_id <> 1

start	end
1	3
5	5
7	10
12	12

We can then format this using a case statement to identify groups of one ID.

with lag_ids as (
  select 
    id, 
    lag(id) over(order by id) as prev_id 
  from test
),
start_end as (
  select 
    id as start, 
    coalesce(
      lead(prev_id) over(order by id),
      (select max(id) from test)
    ) as end
  from lag_ids 
  where prev_id is null 
     or id - prev_id <> 1
)
select
  case
  when start = end then start
  when start < end then concat(start, '..', end)
  else 'error'
  end as id_range
from start_end
order by start

id_range
1..3
5
7..10
12

You can then group concat the case to get 1..3,5,7..10,12.

Demonstration.

Spillage answered 6/8, 2024 at 20:8 Comment(0)

L

2

Perl's Set::IntSpan module does this:

$ perl -MSet::IntSpan -E 'say Set::IntSpan->new(\@ARGV)->run_list' 1 6 2 7 3 8
1-3,6-8

Liuka answered 7/8, 2024 at 11:51 Comment(0)

S

1

here is a short query for you

SELECT GROUP_CONCAT(
  CONCAT( IF(l=h, l, CONCAT(l,"..",h)))
) GAP
FROM ( 
  SELECT MIN(t1.id) l,MAX(t1.id) h, MAX(IF(t2.id IS NULL,@grp:=@grp+1,@grp)) AS grp
  FROM id_table t1
  LEFT JOIN id_table t2 ON t2.id = t1.id +1
  CROSS JOIN ( SELECT @grp := 0 ) AS INIT
  GROUP BY @grp
) as r;

result

100..103,110,120..121,200..203,400,500

sample

dbfiddle

Southernly answered 6/8, 2024 at 21:24 Comment(0)

Recommended topics

Hot tags