How to unzip an iterator?
Asked Answered
W

4

16

Given a list of pairs xys, the Python idiom to unzip it into two lists is:

xs, ys = zip(*xys)

If xys is an iterator, how can I unzip it into two iterators, without storing everything in memory?

Wizardly answered 12/6, 2015 at 14:3 Comment(4)
"Given a list of pairs xys" So something like [(1,2), (2,3), (3,4), ...]? You don't need to zip that; it's already list (or iterator) of tuples.Discontinuance
I want to convert xys into two separate iterators xs = [1,2,3,...] and ys = [2,3,4,...]Worldbeater
So xys is something like [[1,2,3], [2,3,4]]? Then itertools.izip is what you want. That's a pair of lists, not a list of pairs.Discontinuance
This doesn't exactly answer your question, but it might be what you actually want: itertools.starmap.Lumberyard
I
13

Suppose you have some iterable of pairs:

a = zip(range(10), range(10))

If I'm correctly interpreting what you are asking for, you could generate independent iterators for the firsts and seconds using itertools.tee:

xs, ys = itertools.tee(a)
xs, ys = (x[0] for x in xs), (y[1] for y in ys)

Note this will keep in memory the "difference" between how much you iterate one of them vs. the other.

Ivatts answered 12/6, 2015 at 14:11 Comment(0)
S
8

If you want to consume one iterator independently from the other, there's no way to avoid pulling stuff into memory, since one of the iterators will progress while the other does not (and hence has to buffer).

Something like this allows you to iterate over both the 'left items' and the 'right items' of the pairs:

 import itertools
 import operator

 it1, it2 = itertools.tee(xys)
 xs = map(operator.itemgetter(0), it1))
 ys = map(operator.itemgetter(1), it2))

 print(next(xs))
 print(next(ys))

...but keep in mind that if you consume only one iterator, the other will buffer items in memory until you start consuming them.

(Btw, assuming Python 3. In Python 2 you need to use itertools.imap(), not map().)

Scherzando answered 12/6, 2015 at 14:9 Comment(1)
Indeed the docs have a warning on this: "This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()."Barbary
O
0

The full answer locates here. Long story short: we can modify Python recipe for itertools.tee function like

from collections import deque


def unzip(iterable):
    """
    Transposes given iterable of finite iterables.
    """
    iterator = iter(iterable)
    try:
        first_elements = next(iterator)
    except StopIteration:
        return ()
    queues = [deque([element])
              for element in first_elements]

    def coordinate(queue):
        while True:
            if not queue:
                try:
                    elements = next(iterator)
                except StopIteration:
                    return
                for sub_queue, element in zip(queues, elements):
                    sub_queue.append(element)
            yield queue.popleft()

    return tuple(map(coordinate, queues))

and then use it

>>> from itertools import count
>>> zipped = zip(count(), count())
>>> xs, ys = unzip(zipped)
>>> next(xs)
0
Odine answered 21/12, 2018 at 18:7 Comment(0)
S
0

The following function essentially performs the opposite of the zip function.

def unzip(iter, n=2):
    iter_copies = itertools.tee(iter, n)
    def _gen(i):
        for x in iter_copies[i]:
            yield x[i]
    indv_iters = []
    for i in range(n):
        indv_iters.append(_gen(i))
    return tuple(indv_iters)

Here is a more compact version:

def unzip(iter, n=2):
    iter_copies = itertools.tee(iter, n)
    indv_iters = [(lambda i: (x[i] for x in iter_copies[i]))(i) for i in range(n)]
    return tuple(indv_iters)

You can verify it works with the following code:

def sample_generator():
    for i, j, k in zip(range(10), range(10, 20), range(20, 30)):
        yield i, j, k

i_iter, j_iter, k_iter = unzip(sample_generator(), 3)

for i, j, k, reference in zip(i_iter, j_iter, k_iter, range(10)):
    assert i == reference
    assert j == reference + 10
    assert k == reference + 20
Subtonic answered 11/1, 2024 at 5:51 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.