How does zip(*[iter(s)]*n) work in Python?
Asked Answered
R

9

126
s = [1,2,3,4,5,6,7,8,9]
n = 3

list(zip(*[iter(s)]*n)) # returns [(1,2,3),(4,5,6),(7,8,9)]

How does zip(*[iter(s)]*n) work? What would it look like if it was written with more verbose code?


This is a technique used for splitting a list into chunks of equal size - see that question for a general overview of the problem.

Rutabaga answered 9/2, 2010 at 23:7 Comment(3)
also take a look here where how it works is also explained: #2202961Landgrabber
if answers here aren't enough, I blogged it here: telliott99.blogspot.com/2010/01/…Ipoh
Although very intriguing, this technique must go against the core "readability" value of Python!Cart
F
130

iter() is an iterator over a sequence. [x] * n produces a list containing n quantity of x, i.e. a list of length n, where each element is x. *arg unpacks a sequence into arguments for a function call. Therefore you're passing the same iterator 3 times to zip(), and it pulls an item from the iterator each time.

x = iter([1,2,3,4,5,6,7,8,9])
print(list(zip(x, x, x)))
Fiester answered 9/2, 2010 at 23:15 Comment(1)
Good to know: when an iterator yields (= returns) an item, you can imagine this item as "consumed". So the next time the iterator is called, it yields the next "unconsumed" item.Linchpin
I
53

The other great answers and comments explain well the roles of argument unpacking and zip().

As Ignacio and ujukatzel say, you pass to zip() three references to the same iterator and zip() makes 3-tuples of the integers—in order—from each reference to the iterator:

1,2,3,4,5,6,7,8,9  1,2,3,4,5,6,7,8,9  1,2,3,4,5,6,7,8,9
^                    ^                    ^            
      ^                    ^                    ^
            ^                    ^                    ^

And since you ask for a more verbose code sample:

chunk_size = 3
L = [1,2,3,4,5,6,7,8,9]

# iterate over L in steps of 3
for start in range(0,len(L),chunk_size): # xrange() in 2.x; range() in 3.x
    end = start + chunk_size
    print L[start:end] # three-item chunks

Following the values of start and end:

[0:3) #[1,2,3]
[3:6) #[4,5,6]
[6:9) #[7,8,9]

FWIW, you can get the same result with map() with an initial argument of None:

>>> map(None,*[iter(s)]*3)
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]

For more on zip() and map(): http://muffinresearch.co.uk/archives/2007/10/16/python-transposing-lists-with-map-and-zip/

Ineluctable answered 10/2, 2010 at 1:32 Comment(0)
M
35

I think one thing that's missed in all the answers (probably obvious to those familiar with iterators) but not so obvious to others is -

Since we have the same iterator, it gets consumed and the remaining elements are used by the zip. So if we simply used the list and not the iter eg.

l = range(9)
zip(*([l]*3)) # note: not an iter here, the lists are not emptied as we iterate 
# output 
[(0, 0, 0), (1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 4), (5, 5, 5), (6, 6, 6), (7, 7, 7), (8, 8, 8)]

Using iterator, pops the values and only keeps remaining available, so for zip once 0 is consumed 1 is available and then 2 and so on. A very subtle thing, but quite clever!!!

Martingale answered 21/5, 2015 at 17:29 Comment(1)
+1, You saved me! I can't believe that other answers skipped this vital detail assuming everybody knows this. Can you give any reference to a documentation which includes this information?Sherrylsherurd
A
10

iter(s) returns an iterator for s.

[iter(s)]*n makes a list of n times the same iterator for s.

So, when doing zip(*[iter(s)]*n), it extracts an item from all the three iterators from the list in order. Since all the iterators are the same object, it just groups the list in chunks of n.

Abessive answered 9/2, 2010 at 23:23 Comment(2)
Not 'n iterators of the same list', but 'n times the same iterator object'. Different iterator objects don't share state, even when they are of the same list.Sension
Thanks, corrected. Indeed that was what I was "thinking", but wrote something else.Abessive
S
7

One word of advice for using zip this way. It will truncate your list if it's length is not evenly divisible. To work around this you could either use itertools.izip_longest if you can accept fill values. Or you could use something like this:

def n_split(iterable, n):
    num_extra = len(iterable) % n
    zipped = zip(*[iter(iterable)] * n)
    return zipped if not num_extra else zipped + [iterable[-num_extra:], ]

Usage:

for ints in n_split(range(1,12), 3):
    print ', '.join([str(i) for i in ints])

Prints:

1, 2, 3
4, 5, 6
7, 8, 9
10, 11
Siderite answered 31/1, 2013 at 16:34 Comment(1)
This is already documented in itertools recipes: docs.python.org/2/library/itertools.html#recipes grouper . No need to reinvent the wheelStockholder
J
7

Unwinding layers of "cleverness", you may find this equivalent spelling easier to follow:

x = iter(s)
for a, b, c in zip(*([x] * n)):
    print(a, b, c)

which is, in turn, equivalent to the even less-clever:

x = iter(accounts_iter)
for a, b, c in zip(x, x, x):
    print(a, b, c)

Now it should start to become clear. There is only a single iterator object, x. On each iteration, zip(), under the covers, calls next(x) 3 times, once for each iterator object passed to it. But it's the same iterator object here each time. So it delivers the first 3 next(x) results, and leaves the shared iterator object waiting to deliver its 4th result next. Lather, rinse, repeat.

BTW, I suspect you're parsing *([iter(x)]*n) incorrectly in your head. The trailing *n happens first, and then the prefix * is applied to the n-element list *n created. f(*iterable) is a shortcut for calling f() with a variable number of arguments, one for each object iterable delivers.

Joceline answered 26/2, 2022 at 3:44 Comment(0)
L
3

I needed to break down each partial step to really internalize how it is working. My notes from the REPL:

>>> # refresher on using list multiples to repeat item
>>> lst = list(range(15))
>>> lst
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
>>> # lst id value
>>> id(lst)
139755081359872
>>> [id(x) for x in [lst]*3]
[139755081359872, 139755081359872, 139755081359872]

# replacing lst with an iterator of lst
# It's the same iterator three times
>>> [id(x) for x in [iter(lst)]*3 ]
[139755085005296, 139755085005296, 139755085005296]
# without starred expression zip would only see single n-item list.
>>> print([iter(lst)]*3)
[<list_iterator object at 0x7f1b440837c0>, <list_iterator object at 0x7f1b440837c0>, <list_iterator object at 0x7f1b440837c0>]
# Must use starred expression to expand n arguments
>>> print(*[iter(lst)]*3)
<list_iterator object at 0x7f1b4418b1f0> <list_iterator object at 0x7f1b4418b1f0> <list_iterator object at 0x7f1b4418b1f0>

# by repeating the same iterator, n-times,
# each pass of zip will call the same iterator.__next__() n times
# this is equivalent to manually calling __next__() until complete
>>> iter_lst = iter(lst)
>>> ((iter_lst.__next__(), iter_lst.__next__(), iter_lst.__next__()))
(0, 1, 2)
>>> ((iter_lst.__next__(), iter_lst.__next__(), iter_lst.__next__()))
(3, 4, 5)
>>> ((iter_lst.__next__(), iter_lst.__next__(), iter_lst.__next__()))
(6, 7, 8)
>>> ((iter_lst.__next__(), iter_lst.__next__(), iter_lst.__next__()))
(9, 10, 11)
>>> ((iter_lst.__next__(), iter_lst.__next__(), iter_lst.__next__()))
(12, 13, 14)
>>> ((iter_lst.__next__(), iter_lst.__next__(), iter_lst.__next__()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

# all together now!
# continuing with same iterator multiple times in list
>>> print(*[iter(lst)]*3)
<list_iterator object at 0x7f1b4418b1f0> <list_iterator object at 0x7f1b4418b1f0> <list_iterator object at 0x7f1b4418b1f0>
>>> zip(*[iter(lst)]*3)
<zip object at 0x7f1b43f14e00>
>>> list(zip(*[iter(lst)]*3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 14)]

# NOTE: must use list multiples. Explicit listing creates 3 unique iterators
>>> [iter(lst)]*3 == [iter(lst), iter(lst), iter(lst)]
False
>>> list(zip(*[[iter(lst), iter(lst), iter(lst)]))
[(0, 0, 0), (1, 1, 1), (2, 2, 2), (3, 3, 3), ....    
Loath answered 22/7, 2021 at 19:17 Comment(0)
B
1

It is probably easier to see what is happening in python interpreter or ipython with n = 2:

In [35]: [iter("ABCDEFGH")]*2
Out[35]: [<iterator at 0x6be4128>, <iterator at 0x6be4128>]

So, we have a list of two iterators which are pointing to the same iterator object. Remember that iter on a object returns an iterator object and in this scenario, it is the same iterator twice due to the *2 python syntactic sugar. Iterators also run only once.

Further, zip takes any number of iterables (sequences are iterables) and creates tuple from i'th element of each of the input sequences. Since both iterators are identical in our case, zip moves the same iterator twice for each 2-element tuple of output.

In [41]: help(zip)
Help on built-in function zip in module __builtin__:

zip(...)
    zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]

    Return a list of tuples, where each tuple contains the i-th element
    from each of the argument sequences.  The returned list is truncated
    in length to the length of the shortest argument sequence.

The unpacking (*) operator ensures that the iterators run to exhaustion which in this case is until there is not enough input to create a 2-element tuple.

This can be extended to any value of n and zip(*[iter(s)]*n) works as described.

Belated answered 22/7, 2018 at 6:26 Comment(2)
Sorry for being slow. But could you explain the "the same iterator twice due to the *2 python syntactic sugar. Iterators also run only once." part please? If so, how come the result is not [("A", "A")....]? Thanks.Brittenybrittingham
@BowenLiu * is just convenience to duplicate an object. Try it with scalars and then with lists. Also try print(*zip(*[iter("ABCDEFG")]*2)) vs print(*zip(*[iter("ABCDEFG"), iter("ABCDEFG")])). Then start tearing the two down into smaller steps to see what the actually iterator objects in the two statements are.Belated
F
0
x = [1,2,3,4,5,6,7,8,9]
zip(*[iter(x)] * 3)

is the same as:

x = [1,2,3,4,5,6,7,8,9]
iter_var = iter(x)
zip(iter_var,iter_var,iter_var)

Each time zip() gets the next value in iter_var it moves to the next value of x. Try running next(iter_var) to see how this works.

Freshen answered 18/12, 2022 at 2:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.