Transpose/Unzip Function (inverse of zip)?
Asked Answered
N

14

609

I have a list of 2-item tuples and I'd like to convert them to 2 lists where the first contains the first item in each tuple and the second list holds the second item.

For example:

original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# and I want to become...
result = (['a', 'b', 'c', 'd'], [1, 2, 3, 4])

Is there a builtin function that does that?

Norbert answered 21/8, 2008 at 4:29 Comment(5)
Great answers below, but also look at numpy's transposeLasalle
See this nice answer to do the same with generators instead of list : how-to-unzip-an-iteratorErine
why is zip called a transpose?Franciskus
@CharlieParker because it is analogous to a matrix transpose in mathematics. If originally the data in each nested sequence is seen as a "row" of a matrix, its values will end up within a "column" of the corresponding matrix represented by the output.Kneedeep
Not an actual inverse, but in some cases d=dict(original) followed by d.keys() and d.values() might be convenient.Christianize
K
913

In 2.x, zip is its own inverse! Provided you use the special * operator.

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]

This is equivalent to calling zip with each element of the list as a separate argument:

zip(('a', 1), ('b', 2), ('c', 3), ('d', 4))

except the arguments are passed to zip directly (after being converted to a tuple), so there's no need to worry about the number of arguments getting too big.

In 3.x, zip returns a lazy iterator, but this is trivially converted:

>>> list(zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)]))
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
Kenzi answered 21/8, 2008 at 4:36 Comment(14)
Oh, if only it were so simple. Unzipping zip([], []) this way does not get you [], []. It gets you []. If only...Sisson
@Sisson it give you zip(*zip([list1], [list2])) gives you ([list1, list2]).Interstratify
@cdhagmann: zip([list1], [list2]) is never what you want, though. That just gives you [(list1, list2)].Sisson
@Sisson I was using [list1] to mean any list named list1 and not as a list with a list with only one list as an entry. So given list1 = [1,2,3,4] and list2 = [1,2,3,4] then zip(*zip(list1, list2)) gives you ([1,2,3,4],[1,2,3,4])Interstratify
@cdhagmann you get [(1, 2, 3, 4), (1, 2, 3, 4)] from your commands.Gibbons
This does not work in Python3. See: #24591114Noranorah
zip does not preserve elements in longer iterables, hence padding is requiredLandpoor
tuple(map(list, zip(*original))) to get precisely the mentioned result.Keverne
@Noranorah This is incorrect. zip works exactly the same in Python 3 except that it returns an iterator instead of a list. In order to get the same output as above you just need to wrap the zip call in a list: list(zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])) will output [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]Befuddle
notice: you can meet memory and performance issues with very long lists.Tahoe
It's probably a discussion for another thread, but what should you use if not lists? Should I be more concerned with quantifying "very long", and choosing or changing structures if they seem close, or preemptively using different structures for data that has the potential to scale?Rexferd
@JohnP: lists are fine. But if you try to realize the full result all at once (by listifying the result of zip), you might use a lot of memory (because all the tuples must be created at once). If you can just iterate over the result of zip without listifying, you'll save a lot of memory. The only other concern is if the input has many elements; the cost there is that it must unpack them all as arguments, and zip will need to create and store iterators for all of them. This is only a real problem with very long lists (think hundreds of thousands of elements or more).Thermocouple
This works in Python 3.9. Pastebin example here. I have to applaud Patrick's cleverness.Bricker
In 2.x, using zip with * still doesn't quite "invert zip" on multiple arguments; it inverts another use of zip with * to unpack a single argument. zip inherently maps many arguments to one output which packs the transposed results together - as it must, since Python functions only return one value. The * unpacks, to match that packing. (Of course, in 3.x, zip gives an iterator instead.)Kneedeep
I
29

You could also do

result = ([ a for a,b in original ], [ b for a,b in original ])

It should scale better. Especially if Python makes good on not expanding the list comprehensions unless needed.

(Incidentally, it makes a 2-tuple (pair) of lists, rather than a list of tuples, like zip does.)

If generators instead of actual lists are ok, this would do that:

result = (( a for a,b in original ), ( b for a,b in original ))

The generators don't munch through the list until you ask for each element, but on the other hand, they do keep references to the original list.

Insulator answered 24/8, 2008 at 17:7 Comment(7)
"Especially if Python makes good on not expanding the list comprehensions unless needed." mmm... normally, list comprehensions are expanded immediately - or do I get something wrong?Hegira
@glglgl: No,you're probably right. I was just hoping some future version might start doing the right thing. (It's not impossible to change, the side-effect semantics that need changes are probably already discouraged.)Insulator
What you hope to get is a generator expresion - which exists already.Hegira
No, what I hope to get is the perennial favourite "a sufficiently smarter compiler" (or interpreter in this case). I don't think there's anything sensible that would be broken by analysing the bejeebus out of the code and doing something wildly different. (like making a lazy collection) Python has never promised this feature, and will most likely never have it, but I can see that dream in the design.Insulator
This does not 'scale better' than the zip(*x) version. zip(*x) only requires one pass through the loop, and does not use up stack elements.Diann
Whether it "scales better" or not depends of the lifecycle of the original data compared to the transposed data. This answer is only better than using zip if the use-case is that the transposed data is used and discarded immediately, while the original lists stay in memory for much longer.Bally
This answer provides much better error reporting when the data is misshapen.Pedaias
J
24

I like to use zip(*iterable) (which is the piece of code you're looking for) in my programs as so:

def unzip(iterable):
    return zip(*iterable)

I find unzip more readable.

Judyjudye answered 1/3, 2014 at 15:0 Comment(0)
W
21

If you have lists that are not the same length, you may not want to use zip as per Patricks answer. This works:

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]

But with different length lists, zip truncates each item to the length of the shortest list:

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e')]

You can use map with no function to fill empty results with None:

>>> map(None, *[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e'), (1, 2, 3, 4, None)]

zip() is marginally faster though.

Wandie answered 2/1, 2011 at 12:14 Comment(3)
You could also use izip_longestVacillation
Known as zip_longest for python3 users.Shetrit
@GrijeshChauhan I know this is really old, but it's a weird built in feature: docs.python.org/2/library/functions.html#map "If function is None, the identity function is assumed; if there are multiple arguments, map() returns a list consisting of tuples containing the corresponding items from all iterables (a kind of transpose operation). The iterable arguments may be a sequence or any iterable object; the result is always a list."Dopp
H
18

To get a tuple of lists, as in the question:

>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple([list(tup) for tup in zip(*original)])
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])

To unpack the two lists into separate variables:

list1, list2 = [list(tup) for tup in zip(*original)]
Hydropic answered 5/3, 2016 at 11:8 Comment(1)
I thnk this is the most accurate answer because, as the question asks, it actually returns a pair of lists (rather than a list of tuples).Cordey
Z
7

Naive approach

def transpose_finite_iterable(iterable):
    return zip(*iterable)  # `itertools.izip` for Python 2 users

works fine for finite iterable (e.g. sequences like list/tuple/str) of (potentially infinite) iterables which can be illustrated like

| |a_00| |a_10| ... |a_n0| |
| |a_01| |a_11| ... |a_n1| |
| |... | |... | ... |... | |
| |a_0i| |a_1i| ... |a_ni| |
| |... | |... | ... |... | |

where

  • n in ℕ,
  • a_ij corresponds to j-th element of i-th iterable,

and after applying transpose_finite_iterable we get

| |a_00| |a_01| ... |a_0i| ... |
| |a_10| |a_11| ... |a_1i| ... |
| |... | |... | ... |... | ... |
| |a_n0| |a_n1| ... |a_ni| ... |

Python example of such case where a_ij == j, n == 2

>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterable(iterable)
>>> next(result)
(0, 0)
>>> next(result)
(1, 1)

But we can't use transpose_finite_iterable again to return to structure of original iterable because result is an infinite iterable of finite iterables (tuples in our case):

>>> transpose_finite_iterable(result)
... hangs ...
Traceback (most recent call last):
  File "...", line 1, in ...
  File "...", line 2, in transpose_finite_iterable
MemoryError

So how can we deal with this case?

... and here comes the deque

After we take a look at docs of itertools.tee function, there is Python recipe that with some modification can help in our case

def transpose_finite_iterables(iterable):
    iterator = iter(iterable)
    try:
        first_elements = next(iterator)
    except StopIteration:
        return ()
    queues = [deque([element])
              for element in first_elements]

    def coordinate(queue):
        while True:
            if not queue:
                try:
                    elements = next(iterator)
                except StopIteration:
                    return
                for sub_queue, element in zip(queues, elements):
                    sub_queue.append(element)
            yield queue.popleft()

    return tuple(map(coordinate, queues))

let's check

>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterables(transpose_finite_iterable(iterable))
>>> result
(<generator object transpose_finite_iterables.<locals>.coordinate at ...>, <generator object transpose_finite_iterables.<locals>.coordinate at ...>)
>>> next(result[0])
0
>>> next(result[0])
1

Synthesis

Now we can define general function for working with iterables of iterables ones of which are finite and another ones are potentially infinite using functools.singledispatch decorator like

from collections import (abc,
                         deque)
from functools import singledispatch


@singledispatch
def transpose(object_):
    """
    Transposes given object.
    """
    raise TypeError('Unsupported object type: {type}.'
                    .format(type=type))


@transpose.register(abc.Iterable)
def transpose_finite_iterables(object_):
    """
    Transposes given iterable of finite iterables.
    """
    iterator = iter(object_)
    try:
        first_elements = next(iterator)
    except StopIteration:
        return ()
    queues = [deque([element])
              for element in first_elements]

    def coordinate(queue):
        while True:
            if not queue:
                try:
                    elements = next(iterator)
                except StopIteration:
                    return
                for sub_queue, element in zip(queues, elements):
                    sub_queue.append(element)
            yield queue.popleft()

    return tuple(map(coordinate, queues))


def transpose_finite_iterable(object_):
    """
    Transposes given finite iterable of iterables.
    """
    yield from zip(*object_)

try:
    transpose.register(abc.Collection, transpose_finite_iterable)
except AttributeError:
    # Python3.5-
    transpose.register(abc.Mapping, transpose_finite_iterable)
    transpose.register(abc.Sequence, transpose_finite_iterable)
    transpose.register(abc.Set, transpose_finite_iterable)

which can be considered as its own inverse (mathematicians call this kind of functions "involutions") in class of binary operators over finite non-empty iterables.


As a bonus of singledispatching we can handle numpy arrays like

import numpy as np
...
transpose.register(np.ndarray, np.transpose)

and then use it like

>>> array = np.arange(4).reshape((2,2))
>>> array
array([[0, 1],
       [2, 3]])
>>> transpose(array)
array([[0, 2],
       [1, 3]])

Note

Since transpose returns iterators and if someone wants to have a tuple of lists like in OP -- this can be made additionally with map built-in function like

>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple(map(list, transpose(original)))
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])

Advertisement

I've added generalized solution to lz package from 0.5.0 version which can be used like

>>> from lz.transposition import transpose
>>> list(map(tuple, transpose(zip(range(10), range(10, 20)))))
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)]

P.S.

There is no solution (at least obvious) for handling potentially infinite iterable of potentially infinite iterables, but this case is less common though.

Zurek answered 21/12, 2018 at 12:46 Comment(0)
F
4

It's only another way to do it but it helped me a lot so I write it here:

Having this data structure:

X=[1,2,3,4]
Y=['a','b','c','d']
XY=zip(X,Y)

Resulting in:

In: XY
Out: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

The more pythonic way to unzip it and go back to the original is this one in my opinion:

x,y=zip(*XY)

But this return a tuple so if you need a list you can use:

x,y=(list(x),list(y))
Fraternal answered 26/1, 2016 at 10:45 Comment(0)
P
4

Consider using more_itertools.unzip:

>>> from more_itertools import unzip
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> [list(x) for x in unzip(original)]
[['a', 'b', 'c', 'd'], [1, 2, 3, 4]]     
Pedaias answered 2/1, 2019 at 21:30 Comment(0)
L
3

None of the previous answers efficiently provide the required output, which is a tuple of lists, rather than a list of tuples. For the former, you can use tuple with map. Here's the difference:

res1 = list(zip(*original))              # [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
res2 = tuple(map(list, zip(*original)))  # (['a', 'b', 'c', 'd'], [1, 2, 3, 4])

In addition, most of the previous solutions assume Python 2.7, where zip returns a list rather than an iterator.

For Python 3.x, you will need to pass the result to a function such as list or tuple to exhaust the iterator. For memory-efficient iterators, you can omit the outer list and tuple calls for the respective solutions.

Ligialignaloes answered 23/8, 2018 at 17:36 Comment(1)
this should be the top answer. it's frustrating to see the other ones that are currently considered 'top'Bacciferous
R
2

While numpy arrays and pandas may be preferrable, this function imitates the behavior of zip(*args) when called as unzip(args).

Allows for generators, like the result from zip in Python 3, to be passed as args as it iterates through values.

def unzip(items, cls=list, ocls=tuple):
    """Zip function in reverse.

    :param items: Zipped-like iterable.
    :type  items: iterable

    :param cls: Container factory. Callable that returns iterable containers,
        with a callable append attribute, to store the unzipped items. Defaults
        to ``list``.
    :type  cls: callable, optional

    :param ocls: Outer container factory. Callable that returns iterable
        containers. with a callable append attribute, to store the inner
        containers (see ``cls``). Defaults to ``tuple``.
    :type  ocls: callable, optional

    :returns: Unzipped items in instances returned from ``cls``, in an instance
        returned from ``ocls``.
    """
    # iter() will return the same iterator passed to it whenever possible.
    items = iter(items)

    try:
        i = next(items)
    except StopIteration:
        return ocls()

    unzipped = ocls(cls([v]) for v in i)

    for i in items:
        for c, v in zip(unzipped, i):
            c.append(v)

    return unzipped

To use list cointainers, simply run unzip(zipped), as

unzip(zip(["a","b","c"],[1,2,3])) == (["a","b","c"],[1,2,3])

To use deques, or other any container sporting append, pass a factory function.

from collections import deque

unzip([("a",1),("b",2)], deque, list) == [deque(["a","b"]),deque([1,2])]

(Decorate cls and/or main_cls to micro manage container initialization, as briefly shown in the final assert statement above.)

Ruffle answered 8/5, 2020 at 18:39 Comment(0)
H
1

Since it returns tuples (and can use tons of memory), the zip(*zipped) trick seems more clever than useful, to me.

Here's a function that will actually give you the inverse of zip.

def unzip(zipped):
    """Inverse of built-in zip function.
    Args:
        zipped: a list of tuples

    Returns:
        a tuple of lists

    Example:
        a = [1, 2, 3]
        b = [4, 5, 6]
        zipped = list(zip(a, b))

        assert zipped == [(1, 4), (2, 5), (3, 6)]

        unzipped = unzip(zipped)

        assert unzipped == ([1, 2, 3], [4, 5, 6])

    """

    unzipped = ()
    if len(zipped) == 0:
        return unzipped

    dim = len(zipped[0])

    for i in range(dim):
        unzipped = unzipped + ([tup[i] for tup in zipped], )

    return unzipped
Hookworm answered 11/6, 2018 at 13:35 Comment(1)
Continually recreating tuples doesn't seem that efficient to me but you could extend this approach using deques which could preallocate memory.Cassowary
C
1

While zip(*seq) is very useful, it may be unsuitable for very long sequences as it will create a tuple of values to be passed in. For example, I've been working with a coordinate system with over a million entries and find it signifcantly faster to create the sequences directly.

A generic approach would be something like this:

from collections import deque
seq = ((a1, b1, …), (a2, b2, …), …)
width = len(seq[0])
output = [deque(len(seq))] * width # preallocate memory
for element in seq:
    for s, item in zip(output, element):
        s.append(item)

But, depending on what you want to do with the result, the choice of collection can make a big difference. In my actual use case, using sets and no internal loop, is noticeably faster than all other approaches.

And, as others have noted, if you are doing this with datasets, it might make sense to use Numpy or Pandas collections instead.

Cassowary answered 26/9, 2018 at 14:8 Comment(0)
C
0

Just to summarize:

# data
a = ('a', 'b', 'c', 'd')
b = (1, 2, 3, 4)

# forward
zipped = zip(a, b)  # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]

# reverse
a_, b_ = zip(*zipped)

# verify
assert a == a_
assert b == b_
Christianize answered 21/8, 2008 at 4:29 Comment(0)
B
-1

Here's a simple one-line answer that produces the desired output:

original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
list(zip(*original))
# [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
Bacciferous answered 7/3, 2022 at 18:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.