Unzip an array of tuples in julia
Asked Answered
A

8

16

Suppose I have an array of tuples:

arr = [(1,2), (3,4), (5,6)]

With python I can do zip(*arr) == [(1, 3, 5), (2, 4, 6)]

What is the equivalent of this in julia?

Aman answered 1/4, 2016 at 23:39 Comment(3)
rather zip(arr...) |> collect: need to splat arr within zip, like in the "proof of correctness".Infusive
@Infusive is right. Those pesky splat ... slipped away at top of first comment. For correctness' sake, will re-comment the comment.Albuminous
zip(arr...) |> collect should do it. And one should ponder the following at least once: collect(zip(zip(arr...)...)) == arr which is true generally.Albuminous
A
8

For larger arrays use @ivirshup's solution below.

For smaller arrays, you can use zip and splitting.

You can achieve the same thing in Julia by using the zip() function (docs here). zip() expects many tuples to work with so you have to use the splatting operator ... to supply your arguments. Also in Julia you have to use the collect() function to then transform your iterables into an array (if you want to).

Here are these functions in action:

arr = [(1,2), (3,4), (5,6)]

# wtihout splatting
collect(zip((1,2), (3,4), (5,6)))

# Output is a vector of arrays:
> ((1,3,5), (2,4,6))

# same results with splatting
collect(zip(arr...))
> ((1,3,5), (2,4,6))
Ania answered 2/4, 2016 at 8:9 Comment(2)
Note that for large arrays this is very slow! See github.com/JuliaLang/julia/issues/13930#issuecomment-155142306Allhallowmas
For arrays as small as 1000 elements this will cause a stack overflow and crash Julia. So like .. don't use this, unless you know that the size of arr will never go above, say, 50.Gertrudegertrudis
P
12

As an alternative to splatting (since that's pretty slow), you could do something like:

unzip(a) = map(x->getfield.(a, x), fieldnames(eltype(a)))

This is pretty quick.

julia> using BenchmarkTools
julia> a = collect(zip(1:10000, 10000:-1:1));
julia> @benchmark unzip(a)
BenchmarkTools.Trial: 
  memory estimate:  156.45 KiB
  allocs estimate:  6
  --------------
  minimum time:     25.260 μs (0.00% GC)
  median time:      31.997 μs (0.00% GC)
  mean time:        48.429 μs (25.03% GC)
  maximum time:     36.130 ms (98.67% GC)
  --------------
  samples:          10000
  evals/sample:     1

By comparison, I have yet to see this complete:

@time collect(zip(a...))
Paragraph answered 6/12, 2018 at 6:27 Comment(2)
I can confirm that the unzip function in this answer is more than 10x faster than splatting into zip and also uses much less memory. I'm using a Mac running Julia 1.1.Quirinal
To unzip an array of arrays of equal length, unzip(a) = [getindex.(a, i) for i in 1:length(a[1])] works pretty wellJusticiary
A
8

For larger arrays use @ivirshup's solution below.

For smaller arrays, you can use zip and splitting.

You can achieve the same thing in Julia by using the zip() function (docs here). zip() expects many tuples to work with so you have to use the splatting operator ... to supply your arguments. Also in Julia you have to use the collect() function to then transform your iterables into an array (if you want to).

Here are these functions in action:

arr = [(1,2), (3,4), (5,6)]

# wtihout splatting
collect(zip((1,2), (3,4), (5,6)))

# Output is a vector of arrays:
> ((1,3,5), (2,4,6))

# same results with splatting
collect(zip(arr...))
> ((1,3,5), (2,4,6))
Ania answered 2/4, 2016 at 8:9 Comment(2)
Note that for large arrays this is very slow! See github.com/JuliaLang/julia/issues/13930#issuecomment-155142306Allhallowmas
For arrays as small as 1000 elements this will cause a stack overflow and crash Julia. So like .. don't use this, unless you know that the size of arr will never go above, say, 50.Gertrudegertrudis
S
3

There is also the Unzip.jl package:

julia> using Unzip

julia> unzip([(1,2), (3,4), (5,6)])
([1, 3, 5], [2, 4, 6])

which seems to work a bit faster than the selected answer:

julia> using Unzip, BenchmarkTools

julia> a = collect(zip(1:10000, 10000:-1:1));

julia> unzip_ivirshup(a) = map(x->getfield.(a, x), fieldnames(eltype(a))) ;

julia> @btime unzip_ivirshup($a);
  18.439 μs (4 allocations: 156.41 KiB)

julia> @btime unzip($a); # unzip from Unzip.jl is faster
  12.798 μs (4 allocations: 156.41 KiB)

julia> unzip(a) == unzip_ivirshup(a) # check output is the same
true
Spinoff answered 5/5, 2021 at 1:15 Comment(0)
O
3

I will add a solution based on the following simple macro

"""
    @unzip xs, ys, ... = us

will expand the assignment into the following code
    xs, ys, ... = map(x -> x[1], us), map(x -> x[2], us), ...
"""
macro unzip(args)
    args.head != :(=) && error("Expression needs to be of form `xs, ys, ... = us`")
    lhs, rhs = args.args
    items = isa(lhs, Symbol) ? [lhs] : lhs.args
    rhs_items = [:(map(x -> x[$i], $rhs)) for i in 1:length(items)]
    rhs_expand = Expr(:tuple, rhs_items...)
    esc(Expr(:(=), lhs, rhs_expand))
end

Since it's just a syntactic expansion, there shouldn't be any performance or type instability issue. Compare to other solutions based on fieldnames, this has the advantage of also working when the array element type is abstract. For example, while

julia> unzip_get_field(a) = map(x->getfield.(a, x), fieldnames(eltype(a)));
julia> unzip_get_field(Any[("a", 3), ("b", 4)])
ERROR: ArgumentError: type does not have a definite number of fields

the macro version still works:

julia> @unzip xs, ys = Any[("a", 3), ("b",4)]
(["a", "b"], [3, 4])
Oligocene answered 16/6, 2021 at 18:48 Comment(0)
G
2

julia:

use ...

for r in zip(arr...)
println(r)
end
Georas answered 1/4, 2016 at 23:47 Comment(0)
I
2

Following up on @ivirshup 's answer I would like to add a version that is still an iterator

unzip(a) = (getfield.(a, x) for x in fieldnames(eltype(a)))

which keeps the result unevaluated until used. It even gives a (very slight) speed improvement when comparing

@benchmark a1, b1 = unzip(a)
BenchmarkTools.Trial: 
  memory estimate:  156.52 KiB
  allocs estimate:  8
  --------------
  minimum time:     33.185 μs (0.00% GC)
  median time:      76.581 μs (0.00% GC)
  mean time:        83.808 μs (18.35% GC)
  maximum time:     7.679 ms (97.82% GC)
  --------------
  samples:          10000
  evals/sample:     1

vs.

BenchmarkTools.Trial: 
  memory estimate:  156.52 KiB
  allocs estimate:  8
  --------------
  minimum time:     33.914 μs (0.00% GC)
  median time:      39.020 μs (0.00% GC)
  mean time:        64.788 μs (16.52% GC)
  maximum time:     7.853 ms (98.18% GC)
  --------------
  samples:          10000
  evals/sample:     1
Invest answered 26/2, 2020 at 10:13 Comment(0)
L
0

There is a minimal package intended to simplify the broadcast-unzip use case: UnzipLoops.jl.

Instead of writing this explicitly:

out = f.(X, Y)
function g(X, Y)
    out = f.(X, Y)
    return map(x->x[1], out), map(x->x[2], out)
end

the package provides this as an efficient shorthand:

@assert g(X, Y) == broadcast_unzip(f, X, Y)

See the UnzipLoops.jl package documentation for more details and alternatives. Relative to Unzip.jl, I think this package is more recent, has a more restricted API, and aims for more performance?

There is also a long-standing issue on the Julia repo about adding a Base.unzip(): https://github.com/JuliaLang/julia/issues/13942.

Lysis answered 11/3 at 14:40 Comment(0)
L
0

Inspired by the macro of @MrVPlusOne, here is a variant that is type-strict:

@generated function unzip(a::A) where {T <: Tuple, A <: AbstractArray{T}}
    isconcretetype(T) || error("Tuple types vary")
    N = length(T.parameters)
    ith_subarray(a, i) = map(x -> x[i], a)
    elems = [Expr(:call, ith_subarray, :a, i) for i in 1:N]
    return Expr(:tuple, elems...)
end

Use it like this:

@assert unzip([(1,"A"),(2,"B")]) == ([1,2], ["A","B"])
unzip([(1,"A"),(2,3)]) # Error: Tuple types vary
unzip([(1,2),(3,)])    # Error: Tuple types vary
Lysis answered 16/3 at 17:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.