Suppose I have an array of tuples:
arr = [(1,2), (3,4), (5,6)]
With python I can do zip(*arr) == [(1, 3, 5), (2, 4, 6)]
What is the equivalent of this in julia?
Suppose I have an array of tuples:
arr = [(1,2), (3,4), (5,6)]
With python I can do zip(*arr) == [(1, 3, 5), (2, 4, 6)]
What is the equivalent of this in julia?
For larger arrays use @ivirshup's solution below.
For smaller arrays, you can use zip
and splitting.
You can achieve the same thing in Julia by using the zip()
function (docs here). zip()
expects many tuples to work with so you have to use the splatting operator ...
to supply your arguments. Also in Julia you have to use the collect()
function to then transform your iterables into an array (if you want to).
Here are these functions in action:
arr = [(1,2), (3,4), (5,6)]
# wtihout splatting
collect(zip((1,2), (3,4), (5,6)))
# Output is a vector of arrays:
> ((1,3,5), (2,4,6))
# same results with splatting
collect(zip(arr...))
> ((1,3,5), (2,4,6))
As an alternative to splatting (since that's pretty slow), you could do something like:
unzip(a) = map(x->getfield.(a, x), fieldnames(eltype(a)))
This is pretty quick.
julia> using BenchmarkTools
julia> a = collect(zip(1:10000, 10000:-1:1));
julia> @benchmark unzip(a)
BenchmarkTools.Trial:
memory estimate: 156.45 KiB
allocs estimate: 6
--------------
minimum time: 25.260 μs (0.00% GC)
median time: 31.997 μs (0.00% GC)
mean time: 48.429 μs (25.03% GC)
maximum time: 36.130 ms (98.67% GC)
--------------
samples: 10000
evals/sample: 1
By comparison, I have yet to see this complete:
@time collect(zip(a...))
unzip
function in this answer is more than 10x faster than splatting into zip
and also uses much less memory. I'm using a Mac running Julia 1.1. –
Quirinal unzip(a) = [getindex.(a, i) for i in 1:length(a[1])]
works pretty well –
Justiciary For larger arrays use @ivirshup's solution below.
For smaller arrays, you can use zip
and splitting.
You can achieve the same thing in Julia by using the zip()
function (docs here). zip()
expects many tuples to work with so you have to use the splatting operator ...
to supply your arguments. Also in Julia you have to use the collect()
function to then transform your iterables into an array (if you want to).
Here are these functions in action:
arr = [(1,2), (3,4), (5,6)]
# wtihout splatting
collect(zip((1,2), (3,4), (5,6)))
# Output is a vector of arrays:
> ((1,3,5), (2,4,6))
# same results with splatting
collect(zip(arr...))
> ((1,3,5), (2,4,6))
There is also the Unzip.jl package:
julia> using Unzip
julia> unzip([(1,2), (3,4), (5,6)])
([1, 3, 5], [2, 4, 6])
which seems to work a bit faster than the selected answer:
julia> using Unzip, BenchmarkTools
julia> a = collect(zip(1:10000, 10000:-1:1));
julia> unzip_ivirshup(a) = map(x->getfield.(a, x), fieldnames(eltype(a))) ;
julia> @btime unzip_ivirshup($a);
18.439 μs (4 allocations: 156.41 KiB)
julia> @btime unzip($a); # unzip from Unzip.jl is faster
12.798 μs (4 allocations: 156.41 KiB)
julia> unzip(a) == unzip_ivirshup(a) # check output is the same
true
I will add a solution based on the following simple macro
"""
@unzip xs, ys, ... = us
will expand the assignment into the following code
xs, ys, ... = map(x -> x[1], us), map(x -> x[2], us), ...
"""
macro unzip(args)
args.head != :(=) && error("Expression needs to be of form `xs, ys, ... = us`")
lhs, rhs = args.args
items = isa(lhs, Symbol) ? [lhs] : lhs.args
rhs_items = [:(map(x -> x[$i], $rhs)) for i in 1:length(items)]
rhs_expand = Expr(:tuple, rhs_items...)
esc(Expr(:(=), lhs, rhs_expand))
end
Since it's just a syntactic expansion, there shouldn't be any performance or type instability issue. Compare to other solutions based on fieldnames
, this has the advantage of also working when the array element type is abstract. For example, while
julia> unzip_get_field(a) = map(x->getfield.(a, x), fieldnames(eltype(a)));
julia> unzip_get_field(Any[("a", 3), ("b", 4)])
ERROR: ArgumentError: type does not have a definite number of fields
the macro version still works:
julia> @unzip xs, ys = Any[("a", 3), ("b",4)]
(["a", "b"], [3, 4])
Following up on @ivirshup 's answer I would like to add a version that is still an iterator
unzip(a) = (getfield.(a, x) for x in fieldnames(eltype(a)))
which keeps the result unevaluated until used. It even gives a (very slight) speed improvement when comparing
@benchmark a1, b1 = unzip(a)
BenchmarkTools.Trial:
memory estimate: 156.52 KiB
allocs estimate: 8
--------------
minimum time: 33.185 μs (0.00% GC)
median time: 76.581 μs (0.00% GC)
mean time: 83.808 μs (18.35% GC)
maximum time: 7.679 ms (97.82% GC)
--------------
samples: 10000
evals/sample: 1
vs.
BenchmarkTools.Trial:
memory estimate: 156.52 KiB
allocs estimate: 8
--------------
minimum time: 33.914 μs (0.00% GC)
median time: 39.020 μs (0.00% GC)
mean time: 64.788 μs (16.52% GC)
maximum time: 7.853 ms (98.18% GC)
--------------
samples: 10000
evals/sample: 1
There is a minimal package intended to simplify the broadcast-unzip use case: UnzipLoops.jl.
Instead of writing this explicitly:
out = f.(X, Y)
function g(X, Y)
out = f.(X, Y)
return map(x->x[1], out), map(x->x[2], out)
end
the package provides this as an efficient shorthand:
@assert g(X, Y) == broadcast_unzip(f, X, Y)
See the UnzipLoops.jl package documentation for more details and alternatives. Relative to Unzip.jl, I think this package is more recent, has a more restricted API, and aims for more performance?
There is also a long-standing issue on the Julia repo about adding a Base.unzip()
: https://github.com/JuliaLang/julia/issues/13942.
Inspired by the macro of @MrVPlusOne, here is a variant that is type-strict:
@generated function unzip(a::A) where {T <: Tuple, A <: AbstractArray{T}}
isconcretetype(T) || error("Tuple types vary")
N = length(T.parameters)
ith_subarray(a, i) = map(x -> x[i], a)
elems = [Expr(:call, ith_subarray, :a, i) for i in 1:N]
return Expr(:tuple, elems...)
end
Use it like this:
@assert unzip([(1,"A"),(2,"B")]) == ([1,2], ["A","B"])
unzip([(1,"A"),(2,3)]) # Error: Tuple types vary
unzip([(1,2),(3,)]) # Error: Tuple types vary
© 2022 - 2024 — McMap. All rights reserved.
zip(arr...) |> collect
: need to splatarr
withinzip
, like in the "proof of correctness". – Infusive...
slipped away at top of first comment. For correctness' sake, will re-comment the comment. – Albuminouszip(arr...) |> collect
should do it. And one should ponder the following at least once:collect(zip(zip(arr...)...)) == arr
which is true generally. – Albuminous