Identify groups of consecutive numbers in a list
Asked Answered
L

20

137

I'd like to identify groups of consecutive numbers in a list, so that:

myfunc([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])

Returns:

[(2,5), (12,17), 20]

And was wondering what the best way to do this was (particularly if there's something inbuilt into Python).

Edit: Note I originally forgot to mention that individual numbers should be returned as individual numbers, not ranges.

Lynellelynett answered 28/1, 2010 at 11:57 Comment(2)
Is that return value a string?Labio
Ideally would prefer something that uses a separate type for ranges vs standalone numbers.Lynellelynett
C
78

more_itertools.consecutive_groups was added in version 4.0.

Demo

import more_itertools as mit


iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
[list(group) for group in mit.consecutive_groups(iterable)]
# [[2, 3, 4, 5], [12, 13, 14, 15, 16, 17], [20]]

Code

Applying this tool, we make a generator function that finds ranges of consecutive numbers.

def find_ranges(iterable):
    """Yield range of consecutive numbers."""
    for group in mit.consecutive_groups(iterable):
        group = list(group)
        if len(group) == 1:
            yield group[0]
        else:
            yield group[0], group[-1]


iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
list(find_ranges(iterable))
# [(2, 5), (12, 17), 20]

The source implementation emulates a classic recipe (as demonstrated by @Nadia Alramli).

Note: more_itertools is a third-party package installable via pip install more_itertools.

Convergent answered 4/12, 2017 at 21:57 Comment(1)
Alternatively, using more_itertools.groupby_transform: [v for k,v in more_itertools.groupby_transform(enumerate(iterable), keyfunc=lambda p: p[1]-p[0], valuefunc=operator.itemgetter(1), reducefunc=to_interval)] with to_interval = lambda g: (sublst[0], sublst[-1]) if len(sublst := list(g)) > 1 else sublst[0]Distant
N
146

EDIT 2: To answer the OP new requirement

ranges = []
for key, group in groupby(enumerate(data), lambda (index, item): index - item):
    group = map(itemgetter(1), group)
    if len(group) > 1:
        ranges.append(xrange(group[0], group[-1]))
    else:
        ranges.append(group[0])

Output:

[xrange(2, 5), xrange(12, 17), 20]

You can replace xrange with range or any other custom class.


Python docs have a very neat recipe for this:

from operator import itemgetter
from itertools import groupby
data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
    print(map(itemgetter(1), g))

Output:

[2, 3, 4, 5]
[12, 13, 14, 15, 16, 17]

If you want to get the exact same output, you can do this:

ranges = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
    group = map(itemgetter(1), g)
    ranges.append((group[0], group[-1]))

output:

[(2, 5), (12, 17)]

EDIT: The example is already explained in the documentation but maybe I should explain it more:

The key to the solution is differencing with a range so that consecutive numbers all appear in same group.

If the data was: [2, 3, 4, 5, 12, 13, 14, 15, 16, 17] Then groupby(enumerate(data), lambda (i,x):i-x) is equivalent of the following:

groupby(
    [(0, 2), (1, 3), (2, 4), (3, 5), (4, 12),
    (5, 13), (6, 14), (7, 15), (8, 16), (9, 17)],
    lambda (i,x):i-x
)

The lambda function subtracts the element index from the element value. So when you apply the lambda on each item. You'll get the following keys for groupby:

[-2, -2, -2, -2, -8, -8, -8, -8, -8, -8]

groupby groups elements by equal key value, so the first 4 elements will be grouped together and so forth.

I hope this makes it more readable.

python 3 version may be helpful for beginners

import the libraries required first

from itertools import groupby
from operator import itemgetter

ranges =[]

for k,g in groupby(enumerate(data),lambda x:x[0]-x[1]):
    group = (map(itemgetter(1),g))
    group = list(map(int,group))
    ranges.append((group[0],group[-1]))
Nimocks answered 28/1, 2010 at 12:26 Comment(6)
almost works in py3k, except it requires lambda x:x[0]-x[1].Dragoon
Could you use please use multi-character variable names? For someone not familiar with map() or groupby(), the meanings of k g, i and x are not clear.Lynellelynett
This was copied from the Python documentations with the same variable names. I changed the names now.Nimocks
Thanks for the improved variable names and handling non-ranged numbers. This is readable, your explanations are great and I've marked this as the preferred answer.Lynellelynett
You'll need to increment the 2nd number in xrange/range because it is non-inclusive. In other words, [2,3,4,5] == xrange(2,6), not xrange(2,5). It may be worth defining a new inclusive range data type.Succinate
Python 3 throws a syntax error on the first example. Here's the first 2 lines updated to work on python 3: for key, group in groupby(enumerate(data), lambda i: i[0] - i[1]): group = list(map(itemgetter(1), group))Timorous
C
78

more_itertools.consecutive_groups was added in version 4.0.

Demo

import more_itertools as mit


iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
[list(group) for group in mit.consecutive_groups(iterable)]
# [[2, 3, 4, 5], [12, 13, 14, 15, 16, 17], [20]]

Code

Applying this tool, we make a generator function that finds ranges of consecutive numbers.

def find_ranges(iterable):
    """Yield range of consecutive numbers."""
    for group in mit.consecutive_groups(iterable):
        group = list(group)
        if len(group) == 1:
            yield group[0]
        else:
            yield group[0], group[-1]


iterable = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
list(find_ranges(iterable))
# [(2, 5), (12, 17), 20]

The source implementation emulates a classic recipe (as demonstrated by @Nadia Alramli).

Note: more_itertools is a third-party package installable via pip install more_itertools.

Convergent answered 4/12, 2017 at 21:57 Comment(1)
Alternatively, using more_itertools.groupby_transform: [v for k,v in more_itertools.groupby_transform(enumerate(iterable), keyfunc=lambda p: p[1]-p[0], valuefunc=operator.itemgetter(1), reducefunc=to_interval)] with to_interval = lambda g: (sublst[0], sublst[-1]) if len(sublst := list(g)) > 1 else sublst[0]Distant
P
25

The "naive" solution which I find somewhat readable atleast.

x = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 22, 25, 26, 28, 51, 52, 57]

def group(L):
    first = last = L[0]
    for n in L[1:]:
        if n - 1 == last: # Part of the group, bump the end
            last = n
        else: # Not part of the group, yield current group and start a new
            yield first, last
            first = last = n
    yield first, last # Yield the last group


>>>print list(group(x))
[(2, 5), (12, 17), (22, 22), (25, 26), (28, 28), (51, 52), (57, 57)]
Poulin answered 28/1, 2010 at 13:21 Comment(3)
I like this answer a lot because it's terse yet readable. However numbers that are outside of ranges should be printed as single digits, not tuples (as I will format the output and have different formatting requirements for individual numbers versus ranges of numbers.Lynellelynett
The other answer looked beautiful and intelligent, but this one is more understandable to me and allowed a beginner like me to expand it according to my needs.Melosa
Could use a list comprehension to print the non-range tuples as single digits: print([i if i[0] != i[1] else i[0] for i in group(x)])Snakebird
D
14

Assuming your list is sorted:

>>> from itertools import groupby
>>> def ranges(lst):
    pos = (j - i for i, j in enumerate(lst))
    t = 0
    for i, els in groupby(pos):
        l = len(list(els))
        el = lst[t]
        t += l
        yield range(el, el+l)


>>> lst = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17]
>>> list(ranges(lst))
[range(2, 6), range(12, 18)]
Dragoon answered 28/1, 2010 at 12:21 Comment(1)
[j - i for i, j in enumerate(lst)] is clever :-)Austroasiatic
F
11

Here it is something that should work, without any import needed:

def myfunc(lst):
    ret = []
    a = b = lst[0]                           # a and b are range's bounds

    for el in lst[1:]:
        if el == b+1: 
            b = el                           # range grows
        else:                                # range ended
            ret.append(a if a==b else (a,b)) # is a single or a range?
            a = b = el                       # let's start again with a single
    ret.append(a if a==b else (a,b))         # corner case for last single/range
    return ret
Faithless answered 28/1, 2010 at 14:47 Comment(0)
H
8

Please note that the code using groupby doesn't work as given in Python 3 so use this.

for k, g in groupby(enumerate(data), lambda x:x[0]-x[1]):
    group = list(map(itemgetter(1), g))
    ranges.append((group[0], group[-1]))
Harri answered 19/5, 2015 at 21:29 Comment(0)
H
4

This is my method in which I tried to prioritize readability. Note that it returns a tuple of the same values if there is only one value in a group. That can be fixed easily in the second snippet I'll post.

def group(values):
    """return the first and last value of each continuous set in a list of sorted values"""

    values = sorted(values)
    first = last = values[0]

    for index in values[1:]:
        if index - last > 1:  # triggered if in a new group
            yield first, last

            first = index  # update first only if in a new group

        last = index  # update last on every iteration

    yield first, last  # this is needed to yield the last set of numbers

Here is the result of a test:

values = [0, 5, 6, 7, 12, 13, 21, 22, 23, 24, 25, 26, 30, 44, 45, 50]
result = list(group(values))
print(result)

result = [(0, 0), (5, 7), (12, 13), (21, 26), (30, 30), (44, 45), (50, 50)]

If you want to return only a single value in the case of a single value in a group, just add a conditional check to the yields:

def group(values):
    """return the first and last value of each continuous set in a list of sorted values"""

    values = sorted(values)

    first = last = values[0]

    for index in values[1:]:
        if index - last > 1:  # triggered if in a new group
            if first == last:
                yield first

            else:
                yield first, last

            first = index  # update first only if in a new group

        last = index  # update last on every iteration

    if first == last:
        yield first

    else:
        yield first, last

result = [0, (5, 7), (12, 13), (21, 26), 30, (44, 45), 50]

Hulton answered 25/10, 2021 at 16:44 Comment(0)
L
3

This doesn't use a standard function - it just iiterates over the input, but it should work:

def myfunc(l):
    r = []
    p = q = None
    for x in l + [-1]:
        if x - 1 == q:
            q += 1
        else:
            if p:
               if q > p:
                   r.append('%s-%s' % (p, q))
               else:
                   r.append(str(p))
            p = q = x
    return '(%s)' % ', '.join(r)

Note that it requires that the input contains only positive numbers in ascending order. You should validate the input, but this code is omitted for clarity.

Labio answered 28/1, 2010 at 12:5 Comment(0)
S
3
import numpy as np

myarray = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
sequences = np.split(myarray, np.array(np.where(np.diff(myarray) > 1)[0]) + 1)
l = []
for s in sequences:
    if len(s) > 1:
        l.append((np.min(s), np.max(s)))
    else:
        l.append(s[0])
print(l)

Output:

[(2, 5), (12, 17), 20]
Servitude answered 6/10, 2017 at 13:21 Comment(0)
T
3

I think this way is simpler than any of the answers I've seen here (Edit: fixed based on comment from Pleastry):

data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]

starts = [x for x in data if x-1 not in data and x+1 in data]
ends = [x for x in data if x-1 in data and x+1 not in data and x not in starts]
singles = [x for x in data if x-1 not in data and x+1 not in data]
list(zip(starts, ends)) + singles

Output:

[(2, 5), (12, 17), 20]

Edited:

As @dawg notes, this is O(n**2). One option to improve performance would be to convert the original list to a set (and also the starts list to a set) i.e.

data = [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
data_as_set = set(data)

starts = [x for x in data_as_set if x-1 not in data_as_set and x+1 in data_as_set]
startset = set(starts)
ends = [x for x in data_as_set if x-1 in data_as_set and x+1 not in data_as_set and x not in startset]
singles = [x for x in data_as_set if x-1 not in data_as_set and x+1 not in data_as_set]

print(list(zip(starts, ends)) + singles) 
Trichromatic answered 23/2, 2021 at 14:10 Comment(5)
This actually fails if you replace 12 with 10 in data array. The correct solution would be: starts = [x for x in data if x-1 not in data and x+1 in data] and ends = [x for x in data if x-1 in data and x+1 not in data and x not in starts]Dyal
Thanks @Dyal - I have edited with your fixTrichromatic
This is one of the most elegant solutions that I came across.Disseminate
This has On**2 complexity.Headrail
@Headrail Well, yes, though wasn't something OP mentioned. I have edited a suggested improvement. Was there a reason you only picked my answer to comment on complexity?Trichromatic
E
2

Using groupby and count from itertools gives us a short solution. The idea is that, in an increasing sequence, the difference between the index and the value will remain the same.

In order to keep track of the index, we can use an itertools.count, which makes the code cleaner as using enumerate:

from itertools import groupby, count

def intervals(data):
    out = []
    counter = count()

    for key, group in groupby(data, key = lambda x: x-next(counter)):
        block = list(group)
        out.append([block[0], block[-1]])
    return out

Some sample output:

print(intervals([0, 1, 3, 4, 6]))
# [[0, 1], [3, 4], [6, 6]]

print(intervals([2, 3, 4, 5]))
# [[2, 5]]
Europium answered 10/6, 2019 at 11:31 Comment(0)
L
1

Here's the answer I came up with. I'm writing the code for other people to understand, so I'm fairly verbose with variable names and comments.

First a quick helper function:

def getpreviousitem(mylist,myitem):
    '''Given a list and an item, return previous item in list'''
    for position, item in enumerate(mylist):
        if item == myitem:
            # First item has no previous item
            if position == 0:
                return None
            # Return previous item    
            return mylist[position-1] 

And then the actual code:

def getranges(cpulist):
    '''Given a sorted list of numbers, return a list of ranges'''
    rangelist = []
    inrange = False
    for item in cpulist:
        previousitem = getpreviousitem(cpulist,item)
        if previousitem == item - 1:
            # We're in a range
            if inrange == True:
                # It's an existing range - change the end to the current item
                newrange[1] = item
            else:    
                # We've found a new range.
                newrange = [item-1,item]
            # Update to show we are now in a range    
            inrange = True    
        else:   
            # We were in a range but now it just ended
            if inrange == True:
                # Save the old range
                rangelist.append(newrange)
            # Update to show we're no longer in a range    
            inrange = False 
    # Add the final range found to our list
    if inrange == True:
        rangelist.append(newrange)
    return rangelist

Example run:

getranges([2, 3, 4, 5, 12, 13, 14, 15, 16, 17])

returns:

[[2, 5], [12, 17]]
Lynellelynett answered 28/1, 2010 at 13:37 Comment(3)
>>> getranges([2, 12, 13]) Outputs: [[12, 13]]. Was that intentional?Dragoon
Yep, I need to fix for individual numbers (per most of the answers on the page). Working on it now.Lynellelynett
Actually I prefer Nadia's answer, groupby() seems like the standard function I wanted.Lynellelynett
F
1

Using numpy + comprehension lists:
With numpy diff function, consequent input vector entries that their difference is not equal to one can be identified. The start and end of the input vector need to be considered.

import numpy as np
data = np.array([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20])

d = [i for i, df in enumerate(np.diff(data)) if df!= 1] 
d = np.hstack([-1, d, len(data)-1])  # add first and last elements 
d = np.vstack([d[:-1]+1, d[1:]]).T

print(data[d])

Output:

 [[ 2  5]   
  [12 17]   
  [20 20]]

Note: The request that individual numbers should be treated differently, (returned as individual, not ranges) was omitted. This can be reached by further post-processing the results. Usually this will make things more complex without gaining any benefit.

Fraternize answered 1/5, 2018 at 11:46 Comment(0)
K
1

One-liner in Python 2.7 if interested:

x = [2, 3, 6, 7, 8, 14, 15, 19, 20, 21]

d = iter(x[:1] + sum(([i1, i2] for i1, i2 in zip(x, x[1:] + x[:1]) if i2 != i1+1), []))

print zip(d, d)

>>> [(2, 3), (6, 8), (14, 15), (19, 21)]
Kanishakanji answered 6/3, 2022 at 23:2 Comment(4)
Might be a few years too late for a Python 2.7 answer.Lynellelynett
Some people are actually stuck at Python 2.7. In my case, the softwares I work with leave me no other choice. This answer was supposed to help people in the same situation.Kanishakanji
Works for me in 3.8 with some minor changesRheotaxis
just print(zip(d, d)) for python 3Disseminate
D
1

One more prettiness:

from itertools import groupby

def myfunc(lst):
    for k, g in groupby(enumerate(lst), key=lambda x: x[1]-x[0]):
        first = last = next(g)[1]
        for _, last in g: pass
        yield first if first==last else (first, last)

>>> list(myfunc([2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]))
[(2, 5), (12, 17), 20]
Damnatory answered 29/6, 2023 at 12:9 Comment(0)
C
0

A short solution that works without additional imports. It accepts any iterable, sorts unsorted inputs, and removes duplicate items:

def ranges(nums):
    nums = sorted(set(nums))
    gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
    edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
    return list(zip(edges, edges))

Example:

>>> ranges([2, 3, 4, 7, 8, 9, 15])
[(2, 4), (7, 9), (15, 15)]

>>> ranges([-1, 0, 1, 2, 3, 12, 13, 15, 100])
[(-1, 3), (12, 13), (15, 15), (100, 100)]

>>> ranges(range(100))
[(0, 99)]

>>> ranges([0])
[(0, 0)]

>>> ranges([])
[]

This is the same as @dansalmo's solution which I found amazing, albeit a bit hard to read and apply (as it's not given as a function).

Note that it could easily be modified to spit out "traditional" open ranges [start, end), by e.g. altering the return statement:

    return [(s, e+1) for s, e in zip(edges, edges)]

I copied this answer over from another question that was marked as a duplicate of this one with the intention to make it easier findable (after I just now searched again for this topic, finding only the question here at first and not being satisfied with the answers given).

Cheerleader answered 17/5, 2018 at 2:41 Comment(0)
D
0

The versions by Mark Byers, Andrea Ambu, SilentGhost, Nadia Alramli, and truppo are simple and fast. The 'truppo' version encouraged me to write a version that retains the same nimble behavior while handling step sizes other than 1 (and lists as singletons elements that don't extend more than 1 step with a given step size). It is given here.

>>> list(ranges([1,2,3,4,3,2,1,3,5,7,11,1,2,3]))
[(1, 4, 1), (3, 1, -1), (3, 7, 2), 11, (1, 3, 1)]
Discover answered 9/12, 2019 at 19:54 Comment(0)
W
0

Not the best approach , but here is my 2 cents

def getConsecutiveValues2(arr): 
    x = ""
    final = []
    end = 0
    start = 0
    for i in range(1,len(arr)) :
        if arr[i] - arr[i-1] == 1 :
            end = i
        else :
            print(start,end)
            final.append(arr[start:end+1])
            start = i
        if i == len(arr) - 1 :
            final.append(arr[start:end+1])
    return final

x = [1,2,3,5,6,8,9,10,11,12]
print(getConsecutiveValues2(x))

>> [[1, 2, 3], [5, 6], [8, 9, 10, 11]]
Woodpile answered 15/12, 2020 at 17:29 Comment(0)
V
0

This implementation works for regular or irregular steps

I needed to achieve the same thing but with the slight difference where steps can be irregular. this is my implementation

def ranges(l):
    if not len(l):
        return range(0,0)
    elif len(l)==1:
        return range(l[0],l[0]+1)
    # get steps
    sl    = sorted(l)
    steps = [i-j for i,j in zip(sl[1:],sl[:-1])]
    # get unique steps indexes range
    groups = [[0,0,steps[0]],]
    for i,s in enumerate(steps):
        if s==groups[-1][-1]:
            groups[-1][1] = i+1
        else:
            groups.append( [i+1,i+1,s] )
            g2 = groups[-2]
            if g2[0]==g2[1]:
                if sl[i+1]-sl[i]==s:
                    _=groups.pop(-2)
                    groups[-1][0] = i
    # create list of ranges 
    return [range(sl[i],sl[j]+s,s) if s!=0 else [sl[i]]*(j+1-i) for i,j,s in groups]

Here's an example

from timeit import timeit

# for regular ranges
l = list(range(1000000))
ranges(l)
>>> [range(0, 1000000)]
l = list(range(10)) + list(range(20,25)) + [1,2,3]
ranges(l)
>>> [range(0, 2), range(1, 3), range(2, 4), range(3, 10), range(20, 25)]
sorted(l);[list(i) for i in ranges(l)]
>>> [0, 1, 1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9, 20, 21, 22, 23, 24]
>>> [[0, 1], [1, 2], [2, 3], [3, 4, 5, 6, 7, 8, 9], [20, 21, 22, 23, 24]]

# for irregular steps list
l = [1, 3, 5, 7, 10, 11, 12, 100, 200, 300, 400, 60, 99, 4000,4001]
ranges(l)
>>> [range(1, 9, 2), range(10, 13), range(60, 138, 39), range(100, 500, 100), range(4000, 4002)]

## Speed test
timeit("ranges(l)","from __main__ import ranges,l", number=1000)/1000
>>> 9.303160999934334e-06
Vidar answered 3/4, 2021 at 15:4 Comment(0)
L
0

Yet another solution if you expect your input to be a set:

def group_years(years):
  consecutive_years = []
  for year in years:
    close = {y for y in years if abs(y - year) == 1}
    for group in consecutive_years:
      if len(close.intersection(group)):
        group |= close
        break
    else:
      consecutive_years.append({year, *close})
  
  return consecutive_years

Example:

group_years({2016, 2017, 2019, 2020, 2022})
Out[54]: [{2016, 2017}, {2019, 2020}, {2022}]
Lap answered 14/7, 2022 at 7:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.