How do I split a list into equally-sized chunks?
Asked Answered
C

70

3116

How do I split a list of arbitrary length into equal sized chunks?


See also: How to iterate over a list in chunks.
To chunk strings, see Split string every nth character?.

Congenial answered 23/11, 2008 at 12:15 Comment(1)
This question has a pretty official answer from Python core developer Raymond Hettinger, which refers to the official docs: stackoverflow.com/a/74120449Fill
O
4425

Here's a generator that yields evenly-sized chunks:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

For Python 2, using xrange instead of range:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in xrange(0, len(lst), n):
        yield lst[i:i + n]

Below is a list comprehension one-liner. The method above is preferable, though, since using named functions makes code easier to understand. For Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)]

For Python 2:

[lst[i:i + n] for i in xrange(0, len(lst), n)]
Oates answered 23/11, 2008 at 12:33 Comment(5)
Your chunks method should be added to stdlib imhoGuillermo
@Guillermo It's already in stdlib, it's called itertools.islice(iterator, chunk_size).Standridge
@Standridge islice does something different: it produces one slice of the iterator.Oates
@NedBatchelder islice() needs a bit of boilerplate to setup a generator out of an iterator, but look how simple this solution is.Standridge
Chunking has been added as itertools.batched(iterable, chunk_size) now in Python 3.12, see more here.Yasukoyataghan
G
661

Something super simple:

def chunks(xs, n):
    n = max(1, n)
    return (xs[i:i+n] for i in range(0, len(xs), n))

For Python 2, use xrange() instead of range().

Guam answered 17/11, 2009 at 20:17 Comment(2)
Using short circuiting, len(l) or 1 to deal with empty lists.Bosson
Slow! Prefer itertools.islice() instead.Standridge
W
435

I know this is kind of old but nobody yet mentioned numpy.array_split:

import numpy as np

lst = range(50)
np.array_split(lst, 5)

Result:

[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
 array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
 array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
 array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]
Wesley answered 5/6, 2013 at 8:54 Comment(5)
This allows you to set the total number of chunks, not the number of elements per chunk.Integration
This method change the type of the elements [ ['a', 1] , ['b', 2] ] with chunk one may become [ ['a', '1'] , ['b', '2'] ]. If type of first element is str then all element become numpy.str_ ...Roveover
It also breaks the lazyness of the iterable and needs O[2x] memory.Standridge
@Integration That problem can be solved using np.split(lst, np.arange(0, len(l), chunk_size)), althoug that requires even more memory and time.Yttrium
The benefit of this solution is that all arrays will be at most different by 1 in size. The accepted answer could have the last chunk a lot shorter. From the docs: "for an array of length l that should be split into n sections, it returns l % n sub-arrays of size l//n + 1 and the rest of size l//n." This approach calculates how many elements would be extra in the last chunk (l % n), and then increases (l % n) arrays by 1 to compensate for that. That's rather neat, and probably some answer here already coded it like a generator.Cobbie
D
360

Directly from the (old) Python documentation (recipes for itertools):

from itertools import izip, chain, repeat

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

The current version, as suggested by J.F.Sebastian:

#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

I guess Guido's time machine works—worked—will work—will have worked—was working again.

These solutions work because [iter(iterable)]*n (or the equivalent in the earlier version) creates one iterator, repeated n times in the list. izip_longest then effectively performs a round-robin of "each" iterator; because this is the same iterator, it is advanced by each such call, resulting in each such zip-roundrobin generating one tuple of n items.

Python ≥3.12

itertools.batched is available.

Dysentery answered 23/11, 2008 at 15:48 Comment(0)
G
307

I'm surprised nobody has thought of using iter's two-argument form:

from itertools import islice

def chunk(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]

This works with any iterable and produces output lazily. It returns tuples rather than iterators, but I think it has a certain elegance nonetheless. It also doesn't pad; if you want padding, a simple variation on the above will suffice:

from itertools import islice, chain, repeat

def chunk_pad(it, size, padval=None):
    it = chain(iter(it), repeat(padval))
    return iter(lambda: tuple(islice(it, size)), (padval,) * size)

Demo:

>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

Like the izip_longest-based solutions, the above always pads. As far as I know, there's no one- or two-line itertools recipe for a function that optionally pads. By combining the above two approaches, this one comes pretty close:

_no_padding = object()

def chunk(it, size, padval=_no_padding):
    if padval == _no_padding:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(padval))
        sentinel = (padval,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

I believe this is the shortest chunker proposed that offers optional padding.

As Tomasz Gandor observed, the two padding chunkers will stop unexpectedly if they encounter a long sequence of pad values. Here's a final variation that works around that problem in a reasonable way:

_no_padding = object()
def chunk(it, size, padval=_no_padding):
    it = iter(it)
    chunker = iter(lambda: tuple(islice(it, size)), ())
    if padval == _no_padding:
        yield from chunker
    else:
        for ch in chunker:
            yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))

Demo:

>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]
Gallows answered 26/2, 2014 at 15:2 Comment(1)
One-liner version: ``` from itertools import islice from functools import partial seq = [1,2,3,4,5,6,7] size = 3 result = list(iter(partial(lambda it: tuple(islice(it, size)), iter(seq)), ())) assert result == [(1, 2, 3), (4, 5, 6), (7,)] ```Lamonicalamont
S
130

Don't reinvent the wheel.

UPDATE: A complete solution is found in Python 3.12+ itertools.batched.

Given

import itertools as it
import collections as ct

import more_itertools as mit


iterable = range(11)
n = 3

Code

itertools.batched++

list(it.batched(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

Details

The following non-native approaches were suggested prior to Python 3.12:

more_itertools+

list(mit.chunked(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

list(mit.sliced(iterable, n))
# [range(0, 3), range(3, 6), range(6, 9), range(9, 11)]

list(mit.grouper(n, iterable))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

list(mit.windowed(iterable, len(iterable)//n, step=n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

list(mit.chunked_even(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

(or DIY, if you want)

The Standard Library

list(it.zip_longest(*[iter(iterable)] * n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
d = {}
for i, x in enumerate(iterable):
    d.setdefault(i//n, []).append(x)
    

list(d.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
dd = ct.defaultdict(list)
for i, x in enumerate(iterable):
    dd[i//n].append(x)
    

list(dd.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

References

+ A third-party library that implements itertools recipes and more. > pip install more_itertools

++Included in Python Standard Library 3.12+. batched is similar to more_itertools.chunked.

Sambar answered 26/8, 2018 at 1:40 Comment(0)
I
125

Here is a generator that work on arbitrary iterables:

def split_seq(iterable, size):
    it = iter(iterable)
    item = list(itertools.islice(it, size))
    while item:
        yield item
        item = list(itertools.islice(it, size))

Example:

>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]
Ianiana answered 23/11, 2008 at 12:41 Comment(0)
F
91

Simple yet elegant

L = range(1, 1000)
print [L[x:x+10] for x in xrange(0, len(L), 10)]

or if you prefer:

def chunks(L, n): return [L[x: x+n] for x in xrange(0, len(L), n)]
chunks(L, 10)
Fanya answered 12/7, 2010 at 7:58 Comment(0)
F
70

How do you split a list into evenly sized chunks?

"Evenly sized chunks", to me, implies that they are all the same length, or barring that option, at minimal variance in length. E.g. 5 baskets for 21 items could have the following results:

>>> import statistics
>>> statistics.variance([5,5,5,5,1]) 
3.2
>>> statistics.variance([5,4,4,4,4]) 
0.19999999999999998

A practical reason to prefer the latter result: if you were using these functions to distribute work, you've built-in the prospect of one likely finishing well before the others, so it would sit around doing nothing while the others continued working hard.

Critique of other answers here

When I originally wrote this answer, none of the other answers were evenly sized chunks - they all leave a runt chunk at the end, so they're not well balanced, and have a higher than necessary variance of lengths.

For example, the current top answer ends with:

[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]

Others, like list(grouper(3, range(7))), and chunk(range(7), 3) both return: [(0, 1, 2), (3, 4, 5), (6, None, None)]. The None's are just padding, and rather inelegant in my opinion. They are NOT evenly chunking the iterables.

Why can't we divide these better?

Cycle Solution

A high-level balanced solution using itertools.cycle, which is the way I might do it today. Here's the setup:

from itertools import cycle
items = range(10, 75)
number_of_baskets = 10

Now we need our lists into which to populate the elements:

baskets = [[] for _ in range(number_of_baskets)]

Finally, we zip the elements we're going to allocate together with a cycle of the baskets until we run out of elements, which, semantically, it exactly what we want:

for element, basket in zip(items, cycle(baskets)):
    basket.append(element)

Here's the result:

>>> from pprint import pprint
>>> pprint(baskets)
[[10, 20, 30, 40, 50, 60, 70],
 [11, 21, 31, 41, 51, 61, 71],
 [12, 22, 32, 42, 52, 62, 72],
 [13, 23, 33, 43, 53, 63, 73],
 [14, 24, 34, 44, 54, 64, 74],
 [15, 25, 35, 45, 55, 65],
 [16, 26, 36, 46, 56, 66],
 [17, 27, 37, 47, 57, 67],
 [18, 28, 38, 48, 58, 68],
 [19, 29, 39, 49, 59, 69]]

To productionize this solution, we write a function, and provide the type annotations:

from itertools import cycle
from typing import List, Any

def cycle_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
    baskets = [[] for _ in range(min(maxbaskets, len(items)))]
    for item, basket in zip(items, cycle(baskets)):
        basket.append(item)
    return baskets

In the above, we take our list of items, and the max number of baskets. We create a list of empty lists, in which to append each element, in a round-robin style.

Slices

Another elegant solution is to use slices - specifically the less-commonly used step argument to slices. i.e.:

start = 0
stop = None
step = number_of_baskets

first_basket = items[start:stop:step]

This is especially elegant in that slices don't care how long the data are - the result, our first basket, is only as long as it needs to be. We'll only need to increment the starting point for each basket.

In fact this could be a one-liner, but we'll go multiline for readability and to avoid an overlong line of code:

from typing import List, Any

def slice_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
    n_baskets = min(maxbaskets, len(items))
    return [items[i::n_baskets] for i in range(n_baskets)]

And islice from the itertools module will provide a lazily iterating approach, like that which was originally asked for in the question.

I don't expect most use-cases to benefit very much, as the original data is already fully materialized in a list, but for large datasets, it could save nearly half the memory usage.

from itertools import islice
from typing import List, Any, Generator
    
def yield_islice_baskets(items: List[Any], maxbaskets: int) -> Generator[List[Any], None, None]:
    n_baskets = min(maxbaskets, len(items))
    for i in range(n_baskets):
        yield islice(items, i, None, n_baskets)

View results with:

from pprint import pprint

items = list(range(10, 75))
pprint(cycle_baskets(items, 10))
pprint(slice_baskets(items, 10))
pprint([list(s) for s in yield_islice_baskets(items, 10)])

Updated prior solutions

Here's another balanced solution, adapted from a function I've used in production in the past, that uses the modulo operator:

def baskets_from(items, maxbaskets=25):
    baskets = [[] for _ in range(maxbaskets)]
    for i, item in enumerate(items):
        baskets[i % maxbaskets].append(item)
    return filter(None, baskets) 

And I created a generator that does the same if you put it into a list:

def iter_baskets_from(items, maxbaskets=3):
    '''generates evenly balanced baskets from indexable iterable'''
    item_count = len(items)
    baskets = min(item_count, maxbaskets)
    for x_i in range(baskets):
        yield [items[y_i] for y_i in range(x_i, item_count, baskets)]
    

And finally, since I see that all of the above functions return elements in a contiguous order (as they were given):

def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
    '''
    generates balanced baskets from iterable, contiguous contents
    provide item_count if providing a iterator that doesn't support len()
    '''
    item_count = item_count or len(items)
    baskets = min(item_count, maxbaskets)
    items = iter(items)
    floor = item_count // baskets 
    ceiling = floor + 1
    stepdown = item_count % baskets
    for x_i in range(baskets):
        length = ceiling if x_i < stepdown else floor
        yield [items.next() for _ in range(length)]

Output

To test them out:

print(baskets_from(range(6), 8))
print(list(iter_baskets_from(range(6), 8)))
print(list(iter_baskets_contiguous(range(6), 8)))
print(baskets_from(range(22), 8))
print(list(iter_baskets_from(range(22), 8)))
print(list(iter_baskets_contiguous(range(22), 8)))
print(baskets_from('ABCDEFG', 3))
print(list(iter_baskets_from('ABCDEFG', 3)))
print(list(iter_baskets_contiguous('ABCDEFG', 3)))
print(baskets_from(range(26), 5))
print(list(iter_baskets_from(range(26), 5)))
print(list(iter_baskets_contiguous(range(26), 5)))

Which prints out:

[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'B', 'C'], ['D', 'E'], ['F', 'G']]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]

Notice that the contiguous generator provide chunks in the same length patterns as the other two, but the items are all in order, and they are as evenly divided as one may divide a list of discrete elements.

Fusty answered 13/2, 2014 at 23:7 Comment(0)
E
65
def chunk(input, size):
    return map(None, *([iter(input)] * size))
Exum answered 26/6, 2010 at 19:10 Comment(2)
Doesn't work in Python 3.8, is that for 2.x?Jacquerie
For Python 3.x: return map(lambda *x: x, *([iter(input)] * size)). Yet it drops tail of the list if it cannot be divided in the equal chunksJacquerie
U
59

If you know list size:

def SplitList(mylist, chunk_size):
    return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]

If you don't (an iterator):

def IterChunks(sequence, chunk_size):
    res = []
    for item in sequence:
        res.append(item)
        if len(res) >= chunk_size:
            yield res
            res = []
    if res:
        yield res  # yield the last, incomplete, portion

In the latter case, it can be rephrased in a more beautiful way if you can be sure that the sequence always contains a whole number of chunks of given size (i.e. there is no incomplete last chunk).

Unfetter answered 23/11, 2008 at 12:40 Comment(0)
P
53

I saw the most awesome Python-ish answer in a duplicate of this question:

from itertools import zip_longest

a = range(1, 16)
i = iter(a)
r = list(zip_longest(i, i, i))
>>> print(r)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

You can create n-tuple for any n. If a = range(1, 15), then the result will be:

[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]

If the list is divided evenly, then you can replace zip_longest with zip, otherwise the triplet (13, 14, None) would be lost. Python 3 is used above. For Python 2, use izip_longest.

Pytlik answered 12/3, 2015 at 12:36 Comment(0)
V
45

Here's the one liner:

[AA[i:i+SS] for i in range(len(AA))[::SS]]

Details. AA is array, SS is chunk size. For example:

>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3

To expand the ranges in py3 do

(py3) >>> [list(AA[i:i+SS]) for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
Vanderhoek answered 16/12, 2015 at 21:42 Comment(0)
C
36

With Assignment Expressions in Python 3.8 it becomes quite nice:

import itertools

def batch(iterable, size):
    it = iter(iterable)
    while item := list(itertools.islice(it, size)):
        yield item

This works on an arbitrary iterable, not just a list.

>>> import pprint
>>> pprint.pprint(list(batch(range(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

UPDATE

Starting with Python 3.12, this exact implementation is available as itertools.batched

Chlorite answered 10/12, 2019 at 11:59 Comment(0)
P
30

If you had a chunk size of 3 for example, you could do:

zip(*[iterable[i::3] for i in range(3)]) 

source: http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/

I would use this when my chunk size is fixed number I can type, e.g. '3', and would never change.

Projector answered 19/4, 2011 at 5:27 Comment(3)
This doesn't work if len(iterable)%3 != 0. The last (short) group of numbers won't be returned.Airbrush
@Airbrush There is zip_longest from itertools: docs.python.org/3/library/itertools.html#itertools.zip_longestCoan
See this other Stack Overflow question for a detailed explanation of this technique.Upside
S
27

The toolz library has the partition function for this:

from toolz.itertoolz.core import partition

list(partition(2, [1, 2, 3, 4]))
[(1, 2), (3, 4)]
Shorten answered 20/11, 2013 at 20:55 Comment(0)
C
24

I was curious about the performance of different approaches and here it is:

Tested on Python 3.5.1

import time
batch_size = 7
arr_len = 298937

#---------slice-------------

print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
    if not arr:
        break

    tmp = arr[0:batch_size]
    arr = arr[batch_size:-1]
print(time.time() - start)

#-----------index-----------

print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
    tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)

#----------batches 1------------

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#----------batches 2------------

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([next(batchiter)], batchiter)


print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#---------chunks-------------
def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
    tmp = x
print(time.time() - start)

#-----------grouper-----------

from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(iterable, n, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
    tmp = x
print(time.time() - start)

Results:

slice
31.18285083770752

index
0.02184295654296875

batches 1
0.03503894805908203

batches 2
0.22681021690368652

chunks
0.019841909408569336

grouper
0.006506919860839844
Cowpea answered 7/1, 2018 at 8:58 Comment(0)
P
21

You may also use get_chunks function of utilspie library as:

>>> from utilspie import iterutils
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list(iterutils.get_chunks(a, 5))
[[1, 2, 3, 4, 5], [6, 7, 8, 9]]

You can install utilspie via pip:

sudo pip install utilspie

Disclaimer: I am the creator of utilspie library.

Pillion answered 27/1, 2017 at 23:12 Comment(0)
P
19

I like the Python doc's version proposed by tzot and J.F.Sebastian a lot, but it has two shortcomings:

  • it is not very explicit
  • I usually don't want a fill value in the last chunk

I'm using this one a lot in my code:

from itertools import islice

def chunks(n, iterable):
    iterable = iter(iterable)
    while True:
        yield tuple(islice(iterable, n)) or iterable.next()

UPDATE: A lazy chunks version:

from itertools import chain, islice

def chunks(n, iterable):
   iterable = iter(iterable)
   while True:
       yield chain([next(iterable)], islice(iterable, n-1))
Presence answered 9/10, 2013 at 6:17 Comment(0)
A
17

code:

def split_list(the_list, chunk_size):
    result_list = []
    while the_list:
        result_list.append(the_list[:chunk_size])
        the_list = the_list[chunk_size:]
    return result_list

a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print split_list(a_list, 3)

result:

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
Apodosis answered 2/7, 2015 at 7:32 Comment(0)
C
13

heh, one line version

In [48]: chunk = lambda ulist, step:  map(lambda i: ulist[i:i+step],  xrange(0, len(ulist), step))

In [49]: chunk(range(1,100), 10)
Out[49]: 
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
 [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
 [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
 [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
 [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
 [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
 [91, 92, 93, 94, 95, 96, 97, 98, 99]]
Christianna answered 23/11, 2008 at 12:51 Comment(2)
Please, use "def chunk" instead of "chunk = lambda". It works the same. One line. Same features. MUCH easier to the n00bz to read and understand.Charlet
The function object resulting from def chunk instead of chunk=lambda has .__name__ attribute 'chunk' instead of '<lambda>'. The specific name is more useful in tracebacks.Horned
E
13

Another more explicit version.

def chunkList(initialList, chunkSize):
    """
    This function chunks a list into sub lists 
    that have a length equals to chunkSize.

    Example:
    lst = [3, 4, 9, 7, 1, 1, 2, 3]
    print(chunkList(lst, 3)) 
    returns
    [[3, 4, 9], [7, 1, 1], [2, 3]]
    """
    finalList = []
    for i in range(0, len(initialList), chunkSize):
        finalList.append(initialList[i:i+chunkSize])
    return finalList
Erastatus answered 28/2, 2015 at 20:5 Comment(0)
U
13

At this point, I think we need a recursive generator, just in case...

In python 2:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e

In python 3:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    yield from chunks(li[n:], n)

Also, in case of massive Alien invasion, a decorated recursive generator might become handy:

def dec(gen):
    def new_gen(li, n):
        for e in gen(li, n):
            if e == []:
                return
            yield e
    return new_gen

@dec
def chunks(li, n):
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e
Unreasoning answered 3/11, 2015 at 23:10 Comment(0)
H
12

Without calling len() which is good for large lists:

def splitter(l, n):
    i = 0
    chunk = l[:n]
    while chunk:
        yield chunk
        i += n
        chunk = l[i:i+n]

And this is for iterables:

def isplitter(l, n):
    l = iter(l)
    chunk = list(islice(l, n))
    while chunk:
        yield chunk
        chunk = list(islice(l, n))

The functional flavour of the above:

def isplitter2(l, n):
    return takewhile(bool,
                     (tuple(islice(start, n))
                            for start in repeat(iter(l))))

OR:

def chunks_gen_sentinel(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return iter(imap(tuple, continuous_slices).next,())

OR:

def chunks_gen_filter(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return takewhile(bool,imap(tuple, continuous_slices))
Honeyed answered 16/2, 2010 at 5:49 Comment(1)
There is no reason to avoid len() on large lists; it's a constant-time operation.Meyerbeer
C
11
def split_seq(seq, num_pieces):
    start = 0
    for i in xrange(num_pieces):
        stop = start + len(seq[i::num_pieces])
        yield seq[start:stop]
        start = stop

usage:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for seq in split_seq(seq, 3):
    print seq
Conventicle answered 24/11, 2008 at 16:56 Comment(0)
S
11

See this reference

>>> orange = range(1, 1001)
>>> otuples = list( zip(*[iter(orange)]*10))
>>> print(otuples)
[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)]
>>> olist = [list(i) for i in otuples]
>>> print(olist)
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]
>>> 

Python3

Savour answered 18/2, 2013 at 13:31 Comment(2)
Nice, but drops elements at the end if the size does not match whole numbers of chunks, e. g. zip(*[iter(range(7))]*3) only returns [(0, 1, 2), (3, 4, 5)] and forgets the 6 from the input.Tied
See this other Stack Overflow question for a detailed explanation of this technique.Upside
B
8
def chunks(iterable,n):
    """assumes n is an integer>0
    """
    iterable=iter(iterable)
    while True:
        result=[]
        for i in range(n):
            try:
                a=next(iterable)
            except StopIteration:
                break
            else:
                result.append(a)
        if result:
            yield result
        else:
            break

g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'
Battat answered 13/2, 2012 at 4:50 Comment(0)
A
8

Since everybody here talking about iterators. boltons has perfect method for that, called iterutils.chunked_iter.

from boltons import iterutils

list(iterutils.chunked_iter(list(range(50)), 11))

Output:

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49]]

But if you don't want to be mercy on memory, you can use old-way and store the full list in the first place with iterutils.chunked.

Adjudge answered 3/11, 2016 at 19:10 Comment(0)
F
7

Consider using matplotlib.cbook pieces

for example:

import matplotlib.cbook as cbook
segments = cbook.pieces(np.arange(20), 3)
for s in segments:
     print s
Fitch answered 3/5, 2011 at 16:27 Comment(0)
T
6
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
CHUNK = 4
[a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )]
Threedecker answered 15/7, 2015 at 23:27 Comment(2)
Can you explain more your answer please ?Taciturnity
Working from backwards: (len(a) + CHUNK -1) / CHUNK Gives you the number of chunks that you will end up with. Then, for each chunk at index i, we are generating a sub-array of the original array like this: a[ i * CHUNK : (i + 1) * CHUNK ] where, i * CHUNK is the index of the first element to put into the subarray, and, (i + 1) * CHUNK is 1 past the last element to put into the subarray. This solution uses list comprehension, so it might be faster for large arrays.Threedecker
N
6

The recipes in the itertools module provide two ways to do this depending on how you want to handle a final odd-sized lot (keep it, pad it with a fillvalue, ignore it, or raise an exception):

from itertools import islice, izip_longest

def batched(iterable, n):
    "Batch data into tuples of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    it = iter(iterable)
    while True:
        batch = tuple(islice(it, n))
        if not batch:
            return
        yield batch

def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
    "Collect data into non-overlapping fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
    # grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
    # grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
    args = [iter(iterable)] * n
    if incomplete == 'fill':
        return zip_longest(*args, fillvalue=fillvalue)
    if incomplete == 'strict':
        return zip(*args, strict=True)
    if incomplete == 'ignore':
        return zip(*args)
    else:
        raise ValueError('Expected fill, strict, or ignore')
Nubia answered 19/10, 2022 at 5:49 Comment(1)
The grouper() source is available in the docs in this form since 3.10; the batched() source is only in 3.11, but has become a built-in function since 3.12. Yaay! 😊Fill
Q
5
>>> def f(x, n, acc=[]): return f(x[n:], n, acc+[(x[:n])]) if x else acc
>>> f("Hallo Welt", 3)
['Hal', 'lo ', 'Wel', 't']
>>> 

If you are into brackets - I picked up a book on Erlang :)

Quita answered 3/11, 2009 at 16:45 Comment(0)
P
5

I realise this question is old (stumbled over it on Google), but surely something like the following is far simpler and clearer than any of the huge complex suggestions and only uses slicing:

def chunker(iterable, chunksize):
    for i,c in enumerate(iterable[::chunksize]):
        yield iterable[i*chunksize:(i+1)*chunksize]

>>> for chunk in chunker(range(0,100), 10):
...     print list(chunk)
... 
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
... etc ...
Platon answered 27/8, 2012 at 22:58 Comment(0)
C
5

Use list comprehensions:

l = [1,2,3,4,5,6,7,8,9,10,11,12]
k = 5 #chunk size
print [tuple(l[x:y]) for (x, y) in [(x, x+k) for x in range(0, len(l), k)]]
Caprice answered 27/2, 2015 at 2:33 Comment(0)
D
5

You could use numpy's array_split function e.g., np.array_split(np.array(data), 20) to split into 20 nearly equal size chunks.

To make sure chunks are exactly equal in size use np.split.

Dialectical answered 20/11, 2016 at 4:32 Comment(0)
I
5

One more solution

def make_chunks(data, chunk_size): 
    while data:
        chunk, data = data[:chunk_size], data[chunk_size:]
        yield chunk

>>> for chunk in make_chunks([1, 2, 3, 4, 5, 6, 7], 2):
...     print chunk
... 
[1, 2]
[3, 4]
[5, 6]
[7]
>>> 
Infecund answered 17/4, 2017 at 15:38 Comment(0)
S
5

I don't think I saw this option, so just to add another one :)) :

def chunks(iterable, chunk_size):
  i = 0;
  while i < len(iterable):
    yield iterable[i:i+chunk_size]
    i += chunk_size
Soidisant answered 3/11, 2017 at 12:38 Comment(0)
A
5

python pydash package could be a good choice.

from pydash.arrays import chunk
ids = ['22', '89', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '1']
chunk_ids = chunk(ids,5)
print(chunk_ids)
# output: [['22', '89', '2', '3', '4'], ['5', '6', '7', '8', '9'], ['10', '11', '1']]

for more checkout pydash chunk list

Afterglow answered 9/7, 2019 at 14:4 Comment(1)
neat! and this is what actualy sits under the hood of pydash.arrays.chunk: chunks = int(ceil(len(array) / float(size))) return [array[i * size:(i + 1) * size] for i in range(chunks)]Sheepskin
P
4
def chunk(lst):
    out = []
    for x in xrange(2, len(lst) + 1):
        if not len(lst) % x:
            factor = len(lst) / x
            break
    while lst:
        out.append([lst.pop(0) for x in xrange(factor)])
    return out
Phillie answered 26/11, 2008 at 7:24 Comment(0)
H
4

letting r be the chunk size and L be the initial list, you can do.

chunkL = [ [i for i in L[r*k:r*(k+1)] ] for k in range(len(L)/r)] 
Hamby answered 9/12, 2014 at 3:54 Comment(0)
A
4

As per this answer, the top-voted answer leaves a 'runt' at the end. Here's my solution to really get about as evenly-sized chunks as you can, with no runts. It basically tries to pick exactly the fractional spot where it should split the list, but just rounds it off to the nearest integer:

from __future__ import division  # not needed in Python 3
def n_even_chunks(l, n):
    """Yield n as even chunks as possible from l."""
    last = 0
    for i in range(1, n+1):
        cur = int(round(i * (len(l) / n)))
        yield l[last:cur]
        last = cur

Demonstration:

>>> pprint.pprint(list(n_even_chunks(list(range(100)), 9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55],
 [56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66],
 [67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77],
 [78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88],
 [89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
>>> pprint.pprint(list(n_even_chunks(list(range(100)), 11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
 [9, 10, 11, 12, 13, 14, 15, 16, 17],
 [18, 19, 20, 21, 22, 23, 24, 25, 26],
 [27, 28, 29, 30, 31, 32, 33, 34, 35],
 [36, 37, 38, 39, 40, 41, 42, 43, 44],
 [45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
 [55, 56, 57, 58, 59, 60, 61, 62, 63],
 [64, 65, 66, 67, 68, 69, 70, 71, 72],
 [73, 74, 75, 76, 77, 78, 79, 80, 81],
 [82, 83, 84, 85, 86, 87, 88, 89, 90],
 [91, 92, 93, 94, 95, 96, 97, 98, 99]]

Compare to the top-voted chunks answer:

>>> pprint.pprint(list(chunks(list(range(100)), 100//9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
 [55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65],
 [66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76],
 [77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87],
 [88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98],
 [99]]
>>> pprint.pprint(list(chunks(list(range(100)), 100//11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
 [9, 10, 11, 12, 13, 14, 15, 16, 17],
 [18, 19, 20, 21, 22, 23, 24, 25, 26],
 [27, 28, 29, 30, 31, 32, 33, 34, 35],
 [36, 37, 38, 39, 40, 41, 42, 43, 44],
 [45, 46, 47, 48, 49, 50, 51, 52, 53],
 [54, 55, 56, 57, 58, 59, 60, 61, 62],
 [63, 64, 65, 66, 67, 68, 69, 70, 71],
 [72, 73, 74, 75, 76, 77, 78, 79, 80],
 [81, 82, 83, 84, 85, 86, 87, 88, 89],
 [90, 91, 92, 93, 94, 95, 96, 97, 98],
 [99]]
Amish answered 6/8, 2016 at 20:44 Comment(2)
This solution seems to fail in some situations: - when n > len(l) - for l = [0,1,2,3,4] and n=3 it returns [[0], [1], [2]] instead of [[0,1], [2,3], [4]]Signorino
@DragonTux: Ah I wrote the function for Python 3 - it gives [[0, 1], [2], [3, 4]]. I added the future import so it works in Python 2 as wellAmish
L
4

I have one solution below which does work but more important than that solution is a few comments on other approaches. First, a good solution shouldn't require that one loop through the sub-iterators in order. If I run

g = paged_iter(list(range(50)), 11))
i0 = next(g)
i1 = next(g)
list(i1)
list(i0)

The appropriate output for the last command is

 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

not

 []

As most of the itertools based solutions here return. This isn't just the usual boring restriction about accessing iterators in order. Imagine a consumer trying to clean up poorly entered data which reversed the appropriate order of blocks of 5, i.e., the data looks like [B5, A5, D5, C5] and should look like [A5, B5, C5, D5] (where A5 is just five elements not a sublist). This consumer would look at the claimed behavior of the grouping function and not hesitate to write a loop like

i = 0
out = []
for it in paged_iter(data,5)
    if (i % 2 == 0):
         swapped = it
    else: 
         out += list(it)
         out += list(swapped)
    i = i + 1

This will produce mysteriously wrong results if you sneakily assume that sub-iterators are always fully used in order. It gets even worse if you want to interleave elements from the chunks.

Second, a decent number of the suggested solutions implicitly rely on the fact that iterators have a deterministic order (they don't e.g. set) and while some of the solutions using islice may be ok it worries me.

Third, the itertools grouper approach works but the recipe relies on internal behavior of the zip_longest (or zip) functions that isn't part of their published behavior. In particular, the grouper function only works because in zip_longest(i0...in) the next function is always called in order next(i0), next(i1), ... next(in) before starting over. As grouper passes n copies of the same iterator object it relies on this behavior.

Finally, while the solution below can be improved if you make the assumption criticized above that sub-iterators are accessed in order and fully perused without this assumption one MUST implicitly (via call chain) or explicitly (via deques or other data structure) store elements for each subiterator somewhere. So don't bother wasting time (as I did) assuming one could get around this with some clever trick.

def paged_iter(iterat, n):
    itr = iter(iterat)
    deq = None
    try:
        while(True):
            deq = collections.deque(maxlen=n)
            for q in range(n):
                deq.append(next(itr))
            yield (i for i in deq)
    except StopIteration:
        yield (i for i in deq)
Leadwort answered 11/1, 2017 at 9:18 Comment(0)
W
4

An abstraction would be

l = [1,2,3,4,5,6,7,8,9]
n = 3
outList = []
for i in range(n, len(l) + n, n):
    outList.append(l[i-n:i])

print(outList)

This will print:

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Word answered 29/6, 2020 at 17:54 Comment(0)
A
3

I wrote a small library expressly for this purpose, available here. The library's chunked function is particularly efficient because it's implemented as a generator, so a substantial amount of memory can be saved in certain situations. It also doesn't rely on the slice notation, so any arbitrary iterator can be used.

import iterlib

print list(iterlib.chunked(xrange(1, 1000), 10))
# prints [(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), (11, 12, 13, 14, 15, 16, 17, 18, 19, 20), ...]
Adowa answered 3/3, 2014 at 4:30 Comment(0)
C
3

The answer above (by koffein) has a little problem: the list is always split into an equal number of splits, not equal number of items per partition. This is my version. The "// chs + 1" takes into account that the number of items may not be divideable exactly by the partition size, so the last partition will only be partially filled.

# Given 'l' is your list

chs = 12 # Your chunksize
partitioned = [ l[i*chs:(i*chs)+chs] for i in range((len(l) // chs)+1) ]
Corker answered 17/4, 2015 at 18:48 Comment(0)
S
3

At this point, I think we need the obligatory anonymous-recursive function.

Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))
chunks = Y(lambda f: lambda n: [n[0][:n[1]]] + f((n[0][n[1]:], n[1])) if len(n[0]) > 0 else [])
Strontian answered 4/11, 2015 at 9:12 Comment(0)
S
3

Here's an idea using itertools.groupby:

def chunks(l, n):
    c = itertools.count()
    return (it for _, it in itertools.groupby(l, lambda x: next(c)//n))

This returns a generator of generators. If you want a list of lists, just replace the last line with

    return [list(it) for _, it in itertools.groupby(l, lambda x: next(c)//n)]

Example returning list of lists:

>>> chunks('abcdefghij', 4)
[['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'j']]

(So yes, this suffers form the "runt problem", which may or may not be a problem in a given situation.)

Sandwich answered 8/3, 2017 at 17:3 Comment(0)
O
3

No magic, but simple and correct:

def chunks(iterable, n):
    """Yield successive n-sized chunks from iterable."""
    values = []
    for i, item in enumerate(iterable, 1):
        values.append(item)
        if i % n == 0:
            yield values
            values = []
    if values:
        yield values
Ornithischian answered 30/10, 2017 at 8:13 Comment(0)
I
3

Simply using zip() to produce similar round-robin zips and returning the remaining elements of lst (that cannot make a "whole" sublist) should do the trick.

def chunkify(lst, n):
    for tup in zip(*[iter(lst)]*n):
        yield tup
    rest = tuple(lst[len(lst)//n*n: ])
    if rest:
        yield rest

list(chunkify(range(7), 3)) # [(0, 1, 2), (3, 4, 5), (6,)]

Since Python 3.12, itertools in the standard library implements batched method that performs the very same operation. For example,

from itertools import batched
list(batched(range(7), 3))  # [(0, 1, 2), (3, 4, 5), (6,)]

Both of these methods are at least as memory efficient as any function in other answers on this page that do the same operation (the peak memory usage is the size of a batch), they are also the fastest ways to do it. The following is a table of runtimes of chunking a list of 1,000,000 elements (the first column is when a chunk size=3 and the second is when chunk size=910).1

    Chunk size         3      910
Functions
cottontail        20.1ms    7.5ms
it_batched        22.1ms    8.3ms
NedBatchelder     72.8ms    8.4ms
nirvana_msu      140.4ms   18.8ms
pylang1          173.7ms   19.0ms
senderle         184.6ms   15.7ms

A one-liner version (Python >=3.8):

list(map(list, zip(*[iter(lst)]*n))) + ([rest] if (rest:=lst[len(lst)//n*n : ]) else [])

1 Code used to produce the table. Only the below functions were considered because the functions defined in @NedBatchelder, @oremj, @RianRizvi, @Mars and @atzz's answers are the same; those in @MarkusJarderot, @nirvana_msu and @RaymondHettinger's are the same, so only one from each group was selected. Tested on Python 3.12.0.

from timeit import repeat

setup = """
import itertools
import more_itertools as mit


def cottontail(lst, n):
    for tup in zip(*[iter(lst)]*n): tup
    rest = tuple(lst[len(lst)//n*n: ])
    if rest: rest

def it_batched(it, n):
    for x in itertools.batched(it, n): x

def NedBatchelder(lst, n):
    for i in range(0, len(lst), n): lst[i:i + n]

def pylang1(iterable, n):
    for x in mit.chunked(iterable, n): x

def senderle(it, size):
    it = iter(it)
    for x in iter(lambda: tuple(itertools.islice(it, size)), ()): x

def nirvana_msu(iterable, size):
    it = iter(iterable)
    while item := list(itertools.islice(it, size)):
        item

lst = list(range(1_000_000))
"""

out = {}
for f in ("NedBatchelder", "pylang1", "senderle", 
          "nirvana_msu", "cottontail", "it_batched"):
    for k in (3, 910):
        tm = min(repeat(f"{f}(lst, {k})", setup, number=100))
        out.setdefault(f, {})[k] = tm*10
out = dict(sorted(out.items(), key=lambda xy: xy[1][3]))

print('    Chunk size         3      910\nFunctions')
for func, val in out.items():
    print("{:<15}  {:>5.1f}ms  {:>5.1f}ms".format(func, val[3], val[910]))
Illuminism answered 13/7, 2022 at 3:38 Comment(2)
Downvoted because this returns a list and so is very inefficient on large inputs where you would like to process the result one by one. See nirvana-msu’s answer for an implementation that works on any iterable and return a generator with very simple code (= easier to read, understand, and debug).Balfour
@Balfour my main argument to post the answer in the first place was based on runtime performance, so memory was never a concern. Besides, the generator version of my answer uses similar amount of peak memory as nirvana-msu's solution. Then again, now that Python 3.12 is out, all of these answers are obsolete and don't matter now.Illuminism
C
2

Like @AaronHall I got here looking for roughly evenly sized chunks. There are different interpretations of that. In my case, if the desired size is N, I would like each group to be of size>=N. Thus, the orphans which are created in most of the above should be redistributed to other groups.

This can be done using:

def nChunks(l, n):
    """ Yield n successive chunks from l.
    Works for lists,  pandas dataframes, etc
    """
    newn = int(1.0 * len(l) / n + 0.5)
    for i in xrange(0, n-1):
        yield l[i*newn:i*newn+newn]
    yield l[n*newn-newn:]

(from Splitting a list of into N parts of approximately equal length) by simply calling it as nChunks(l,l/n) or nChunks(l,floor(l/n))

Camaraderie answered 3/9, 2014 at 17:43 Comment(0)
E
2

This works in v2/v3, is inlineable, generator-based and uses only the standard library:

import itertools
def split_groups(iter_in, group_size):
    return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))
Eating answered 6/7, 2017 at 22:24 Comment(1)
Just do a (list(x) for x in split_groups('abcdefghij', 4)), then iterate through them: as opposed to many examples here this would work with groups of any size.Eating
Z
2

A simple solution

The OP has requested "equal sized chunk". I understand "equal sized" as "balanced" sizes: we are looking for groups of items of approximately the same sizes if equal sizes are not possible (e.g, 23/5).

Inputs here are:

  • the list of items: input_list (list of 23 numbers, for instance)
  • the number of groups to split those items: n_groups (5, for instance)

Input:

input_list = list(range(23))
n_groups = 5

Groups of contiguous elements:

approx_sizes = len(input_list)/n_groups 

groups_cont = [input_list[int(i*approx_sizes):int((i+1)*approx_sizes)] 
               for i in range(n_groups)]

Groups of "every-Nth" elements:

groups_leap = [input_list[i::n_groups] 
               for i in range(n_groups)]

Results

print(len(input_list))

print('Contiguous elements lists:')
print(groups_cont)

print('Leap every "N" items lists:')
print(groups_leap)

Will output:

23

Contiguous elements lists:
[[0, 1, 2, 3], [4, 5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16, 17], [18, 19, 20, 21, 22]]

Leap every "N" items lists:
[[0, 5, 10, 15, 20], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18], [4, 9, 14, 19]]
Zeitler answered 6/4, 2021 at 11:6 Comment(0)
S
1
  • Works with any iterable
  • Inner data is generator object (not a list)
  • One liner
In [259]: get_in_chunks = lambda itr,n: ( (v for _,v in g) for _,g in itertools.groupby(enumerate(itr),lambda (ind,_): ind/n))

In [260]: list(list(x) for x in get_in_chunks(range(30),7))
Out[260]:
[[0, 1, 2, 3, 4, 5, 6],
 [7, 8, 9, 10, 11, 12, 13],
 [14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27],
 [28, 29]]
Seline answered 13/9, 2013 at 19:11 Comment(1)
g = get_in_chunks(range(30),7); i0=next(g);i1=next(g);list(i1);list(i0); Last evaluation is empty. Hidden requirement about accessing all the sublists in order seems really bad here to me because the goal with these kind of utils is often to shuffle data around in various ways.Leadwort
Y
1

I have come up to following solution without creation temorary list object, which should work with any iterable object. Please note that this version for Python 2.x:

def chunked(iterable, size):
    stop = []
    it = iter(iterable)
    def _next_chunk():
        try:
            for _ in xrange(size):
                yield next(it)
        except StopIteration:
            stop.append(True)
            return

    while not stop:
        yield _next_chunk()

for it in chunked(xrange(16), 4):
   print list(it)

Output:

[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10, 11]
[12, 13, 14, 15] 
[]

As you can see if len(iterable) % size == 0 then we have additional empty iterator object. But I do not think that it is big problem.

Yoo answered 18/9, 2015 at 17:54 Comment(0)
T
1

Since I had to do something like this, here's my solution given a generator and a batch size:

def pop_n_elems_from_generator(g, n):
    elems = []
    try:
        for idx in xrange(0, n):
            elems.append(g.next())
        return elems
    except StopIteration:
        return elems
Threatt answered 16/10, 2015 at 22:9 Comment(0)
E
1

This question reminds me of the Raku (formerly Perl 6) .comb(n) method. It breaks up strings into n-sized chunks. (There's more to it than that, but I'll leave out the details.)

It's easy enough to implement a similar function in Python3 as a lambda expression:

comb = lambda s,n: (s[i:i+n] for i in range(0,len(s),n))

Then you can call it like this:

some_list = list(range(0, 20))  # creates a list of 20 elements
generator = comb(some_list, 4)  # creates a generator that will generate lists of 4 elements
for sublist in generator:
    print(sublist)  # prints a sublist of four elements, as it's generated

Of course, you don't have to assign the generator to a variable; you can just loop over it directly like this:

for sublist in comb(some_list, 4):
    print(sublist)  # prints a sublist of four elements, as it's generated

As a bonus, this comb() function also operates on strings:

list( comb('catdogant', 3) )  # returns ['cat', 'dog', 'ant']
Eiger answered 15/7, 2019 at 15:27 Comment(0)
M
1

A generic chunker for any iterable, which gives the user a choice of how to handle a partial chunk at the end.

Tested on Python 3.

chunker.py

from enum import Enum

class PartialChunkOptions(Enum):
    INCLUDE = 0
    EXCLUDE = 1
    PAD = 2
    ERROR = 3

class PartialChunkException(Exception):
    pass

def chunker(iterable, n, on_partial=PartialChunkOptions.INCLUDE, pad=None):
    """
    A chunker yielding n-element lists from an iterable, with various options
    about what to do about a partial chunk at the end.

    on_partial=PartialChunkOptions.INCLUDE (the default):
                     include the partial chunk as a short (<n) element list

    on_partial=PartialChunkOptions.EXCLUDE
                     do not include the partial chunk

    on_partial=PartialChunkOptions.PAD
                     pad to an n-element list 
                     (also pass pad=<pad_value>, default None)

    on_partial=PartialChunkOptions.ERROR
                     raise a RuntimeError if a partial chunk is encountered
    """

    on_partial = PartialChunkOptions(on_partial)        

    iterator = iter(iterable)
    while True:
        vals = []
        for i in range(n):
            try:
                vals.append(next(iterator))
            except StopIteration:
                if vals:
                    if on_partial == PartialChunkOptions.INCLUDE:
                        yield vals
                    elif on_partial == PartialChunkOptions.EXCLUDE:
                        pass
                    elif on_partial == PartialChunkOptions.PAD:
                        yield vals + [pad] * (n - len(vals))
                    elif on_partial == PartialChunkOptions.ERROR:
                        raise PartialChunkException
                    return
                return
        yield vals

test.py

import chunker

chunk_size = 3

for it in (range(100, 107),
          range(100, 109)):

    print("\nITERABLE TO CHUNK: {}".format(it))
    print("CHUNK SIZE: {}".format(chunk_size))

    for option in chunker.PartialChunkOptions.__members__.values():
        print("\noption {} used".format(option))
        try:
            for chunk in chunker.chunker(it, chunk_size, on_partial=option):
                print(chunk)
        except chunker.PartialChunkException:
            print("PartialChunkException was raised")
    print("")

output of test.py


ITERABLE TO CHUNK: range(100, 107)
CHUNK SIZE: 3

option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106]

option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]

option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, None, None]

option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
PartialChunkException was raised


ITERABLE TO CHUNK: range(100, 109)
CHUNK SIZE: 3

option PartialChunkOptions.INCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.EXCLUDE used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.PAD used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

option PartialChunkOptions.ERROR used
[100, 101, 102]
[103, 104, 105]
[106, 107, 108]

Microcline answered 5/6, 2020 at 13:34 Comment(0)
B
1

To split a list into equally-sized chunks we can use a loop to iterate through the list and use the slice() function to extract a portion of the list at each iteration.

def chunkify(lst, size):
    """Split a list into equally-sized chunks."""
    chunks = []
    for i in range(0, len(lst), size):
        chunks.append(lst[i:i+size])
    return chunks

Here, lst is the list you want to split and size is the size of each chunk. The range() function is used to generate a sequence of indexes to slice the list. The slice() function extracts a portion of the list from index i to index i+size.

Blinkers answered 27/3, 2023 at 11:14 Comment(1)
What does this improve on over existing answers?Dedifferentiation
V
0

I dislike idea of splitting elements by chunk size, e.g. script can devide 101 to 3 chunks as [50, 50, 1]. For my needs I needed spliting proportionly, and keeping order same. First I wrote my own script, which works fine, and it's very simple. But I've seen later this answer, where script is better than mine, I reccomend it. Here's my script:

def proportional_dividing(N, n):
    """
    N - length of array (bigger number)
    n - number of chunks (smaller number)
    output - arr, containing N numbers, diveded roundly to n chunks
    """
    arr = []
    if N == 0:
        return arr
    elif n == 0:
        arr.append(N)
        return arr
    r = N // n
    for i in range(n-1):
        arr.append(r)
    arr.append(N-r*(n-1))

    last_n = arr[-1]
    # last number always will be r <= last_n < 2*r
    # when last_n == r it's ok, but when last_n > r ...
    if last_n > r:
        # ... and if difference too big (bigger than 1), then
        if abs(r-last_n) > 1:
            #[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7] # N=29, n=12
            # we need to give unnecessary numbers to first elements back
            diff = last_n - r
            for k in range(diff):
                arr[k] += 1
            arr[-1] = r
            # and we receive [3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2]
    return arr

def split_items(items, chunks):
    arr = proportional_dividing(len(items), chunks)
    splitted = []
    for chunk_size in arr:
        splitted.append(items[:chunk_size])
        items = items[chunk_size:]
    print(splitted)
    return splitted

items = [1,2,3,4,5,6,7,8,9,10,11]
chunks = 3
split_items(items, chunks)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm'], 3)
split_items(['a','b','c','d','e','f','g','h','i','g','k','l', 'm', 'n'], 3)
split_items(range(100), 4)
split_items(range(99), 4)
split_items(range(101), 4)

and output:

[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]
[['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'g', 'k', 'l', 'm']]
[['a', 'b', 'c', 'd', 'e'], ['f', 'g', 'h', 'i', 'g'], ['k', 'l', 'm', 'n']]
[range(0, 25), range(25, 50), range(50, 75), range(75, 100)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 99)]
[range(0, 25), range(25, 50), range(50, 75), range(75, 101)]
Veolaver answered 23/3, 2018 at 18:27 Comment(0)
C
0
def main():
  print(chunkify([1,2,3,4,5,6],2))

def chunkify(list, n):
  chunks = []
  for i in range(0, len(list), n):
    chunks.append(list[i:i+n])
  return chunks

main()

I think that it's simple and can give you a chunk of an array.

Cheep answered 14/4, 2020 at 16:18 Comment(0)
B
0

I've created these two fancy one-liners which are efficient and lazy, both input and output are iterables, also they doen't depend on any module:

First one-liner is totally lazy meaning that it returns iterator producing iterators (i.e. each chunk produced is iterator iterating over chunk's elements), this version is good for the case if chunks are very large or elements are produced slowly one by one and should become available immediately as they are produced:

Try it online!

chunk_iters = lambda it, n: ((e for i, g in enumerate(((f,), cit)) for j, e in zip(range((1, n - 1)[i]), g)) for cit in (iter(it),) for f in cit)

Second one-liner returns iterator that produces lists. Each list is produced as soon as elements of whole chunk become available through input iterator or if very last element of last chunk is reached. This version should be used if input elements are produced fast or all available immediately. Other wise first more-lazy one-liner version should be used.

Try it online!

chunk_lists = lambda it, n: (l for l in ([],) for i, g in enumerate((it, ((),))) for e in g for l in (l[:len(l) % n] + [e][:1 - i],) if (len(l) % n == 0) != i)

Also I provide multi-line version of first chunk_iters one-liner, which returns iterator producing another iterators (going through each chunk's elements):

Try it online!

def chunk_iters(it, n):
    cit = iter(it)
    def one_chunk(f):
        yield f
        for i, e in zip(range(n - 1), cit):
            yield e
    for f in cit:
        yield one_chunk(f)
Belldas answered 24/9, 2020 at 7:1 Comment(0)
P
0

Let's say the list is lst

import math

# length of the list len(lst) is ln
# size of a chunk is size

for num in range ( math.ceil(ln/size) ):
    start, end = num*size, min((num+1)*size, ln)
    print(lst[start:end])
Poll answered 31/1, 2022 at 14:53 Comment(0)
C
0

You may use more_itertools.chunked_even along with math.ceil. Likely the easiest to reason?

from math import ceil
import more_itertools as mit
from pprint import pprint

pprint([*mit.chunked_even(range(19), ceil(19 / 5))])
# [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18]]

pprint([*mit.chunked_even(range(20), ceil(20 / 5))])
# [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19]]

pprint([*mit.chunked_even(range(21), ceil(21 / 5))])
# [[0, 1, 2, 3, 4],
# [5, 6, 7, 8],
# [9, 10, 11, 12],
# [13, 14, 15, 16],
# [17, 18, 19, 20]]

pprint([*mit.chunked_even(range(3), ceil(3 / 5))])
# [[0], [1], [2]]


Cental answered 7/8, 2022 at 14:19 Comment(1)
It was already shown in a 2018 answer https://mcmap.net/q/21696/-how-do-i-split-a-list-into-equally-sized-chunksHoi
A
0

Here's a short and readable answer (different from all previous answers):

  • No packages
  • Works when list not evenly divisible into n chunks
  • Easily changeable into generator
import math
 
def chunk(lst, n):
    chunk_size = math.ceil(len(lst) / n)
    return [lst[i: min(i+chunk_size, len(lst))] for i in range(0, len(lst), chunk_size)]

Examples:

chunk(lst=list(range(9)), n=3) gives [[0, 1, 2], [3, 4, 5], [6, 7, 8]]

chunk(lst=list(range(10)), n=3) gives [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9]]

chunk(lst=list(range(10)), n=3) gives [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10]]

Amenra answered 16/5, 2023 at 21:19 Comment(0)
S
0

With Python 3.12, this is now natively supported with itertools.batched

from itertools import batched

flattened_data = ['roses', 'red', 'violets', 'blue', 'sugar']
unflattened = list(batched(flattened_data, 2))
assert unflattened == [('roses', 'red'), ('violets', 'blue'), ('sugar',)]

This is completely lazy - iterator is only ever consumed enough to fill the current chunk.

Spearhead answered 28/9, 2023 at 21:32 Comment(0)
G
-1

Lazy loading version

import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[range(10, 20),
 range(20, 30),
 range(30, 40),
 range(40, 50),
 range(50, 60),
 range(60, 70),
 range(70, 75)]

Confer this implementation's result with the example usage result of the accepted answer.

Many of the above functions assume that the length of the whole iterable are known up front, or at least are cheap to calculate.

For some streamed objects that would mean loading the full data into memory first (e.g. to download the whole file) to get the length information.

If you however don't know the the full size yet, you can use this code instead:

def chunks(iterable, size):
    """
    Yield successive chunks from iterable, being `size` long.

    https://mcmap.net/q/21696/-how-do-i-split-a-list-into-equally-sized-chunks
    :param iterable: The object you want to split into pieces.
    :param size: The size each of the resulting pieces should have.
    """
    i = 0
    while True:
        sliced = iterable[i:i + size]
        if len(sliced) == 0:
            # to suppress stuff like `range(max, max)`.
            break
        # end if
        yield sliced
        if len(sliced) < size:
            # our slice is not the full length, so we must have passed the end of the iterator
            break
        # end if
        i += size  # so we start the next chunk at the right place.
    # end while
# end def

This works because the slice command will return less/no elements if you passed the end of an iterable:

"abc"[0:2] == 'ab'
"abc"[2:4] == 'c'
"abc"[4:6] == ''

We now use that result of the slice, and calculate the length of that generated chunk. If it is less than what we expect, we know we can end the iteration.

That way the iterator will not be executed unless access.

Gerdi answered 20/4, 2019 at 18:28 Comment(0)
C
-1

An old school approach that does not require itertools but still works with arbitrary generators:

def chunks(g, n):
  """divide a generator 'g' into small chunks
  Yields:
    a chunk that has 'n' or less items
  """
  n = max(1, n)
  buff = []
  for item in g:
    buff.append(item)
    if len(buff) == n:
      yield buff
      buff = []
  if buff:
    yield buff
Cletacleti answered 3/12, 2019 at 18:34 Comment(0)
E
-1

This task can be easily done using the generator in the accepted answer. I'm adding class implementation that implements length methods, which may be useful to somebody. I needed to know the progress (with tqdm) so the generator should've returned the number of chunks.

class ChunksIterator(object):
    def __init__(self, data, n):
        self._data = data
        self._l = len(data)
        self._n = n

    def __iter__(self):
        for i in range(0, self._l, self._n):
            yield self._data[i:i + self._n]

    def __len__(self):
        rem = 1 if self._l % self._n != 0 else 0
        return self._l // self._n + rem

Usage:

it = ChunksIterator([1,2,3,4,5,6,7,8,9], 2)
print(len(it))
for i in it:
  print(i)
Escargot answered 16/12, 2021 at 15:29 Comment(1)
your __len__ method might be nicer if you used divmod()Defoliate
C
-1
def devideChunks(x, n):
    newList = []
    for i in range(0, len(x), n):
        newList.append(x[i:i + n])

    print(newList)
Chokedamp answered 17/5, 2023 at 18:9 Comment(2)
While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions applySpathe
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Booboo
L
-2

One-liner version of senderle's answer:

from itertools import islice
from functools import partial

seq = [1,2,3,4,5,6,7]
size = 3
result = list(iter(partial(lambda it: tuple(islice(it, size)), iter(seq)), ()))
assert result == [(1, 2, 3), (4, 5, 6), (7,)]
Lamonicalamont answered 2/1, 2022 at 11:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.