How to construct matrix from row/column vectors in Julia
Asked Answered
C

4

5

Given a function/expression that yields a single column, how to build a matrix from those columns in Julia? I tried the following (simplified example):

column(j) = [j, j, j]      # for example
my_matrix = Float64[column(j)[i] for i in 1:3, j in 1:4]

==> 3x4 Array{Float64,2}:
 1.0  2.0  3.0  4.0
 1.0  2.0  3.0  4.0
 1.0  2.0  3.0  4.0

The result is correct, but this is not what I want because the column-vector-expression is evaluated unnecessarily (once for every row).

I also tried this alternative approach:

[column(j) for j in 1:4]

==> 4-element Array{Array{Float64,1},1}:
 [1.0,1.0,1.0]
 [2.0,2.0,2.0]
 [3.0,3.0,3.0]
 [4.0,4.0,4.0]

But I found no way to convert or reshape this into the form I want (matrix with two dimensions instead of array of arrays).

How to achieve this in Julia?

Chasseur answered 27/4, 2015 at 13:42 Comment(0)
G
4

Since Julia 1.9.0 stack can be used to combine a collection of arrays (or other iterable objects) of equal size into one larger array, by arranging them along one or more new dimensions so it can be used to construct a matrix from row/column vectors (returned from function/expression). For versions before 1.9.0 it can be activated with using Compat.

column(j) = [j, j, j]

stack(column(j) for j=1:4)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

What is fast and memory efficient.

using BenchmarkTools
column(j) = [j, j, j]

@btime stack(column(j) for j=1:4)
#  193.315 ns (5 allocations: 480 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime begin  #@Benoît Legat
    m = Matrix{Int}(undef, 3, 4)
    for j in 1:4
        m[:, j] = column(j)
    end
    m
end
#  201.942 ns (5 allocations: 480 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime reduce(hcat, [column(j) for j=1:4])  #@Przemyslaw Szufel
#  249.676 ns (6 allocations: 560 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime reshape(collect(Iterators.flatten([column(j) for j in 1:4])), 3, 4)  #@Benoît Legat
#  392.325 ns (10 allocations: 976 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime Int[column(j)[i] for i in 1:3, j in 1:4]  #@radioflash
#  440.211 ns (13 allocations: 1.09 KiB)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime hcat([column(j) for j=1:4]...)  #@spencerlyon2
#  644.676 ns (10 allocations: 784 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

In case you have already a vector.

c = [[j,j,j] for j in 1:4]
#4-element Vector{Vector{Int64}}:
# [1, 1, 1]
# [2, 2, 2]
# [3, 3, 3]
# [4, 4, 4]

stack(c)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

using BenchmarkTools

@btime stack($c)
#  63.204 ns (1 allocation: 160 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime reduce(hcat, $c)  #@Przemyslaw Szufel
#  68.758 ns (1 allocation: 160 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime begin  #@Benoît Legat
    m = Matrix{Int}(undef, 3, 4)
    for j in 1:4
        m[:, j] = $c[j]
    end
    m
end
#  75.586 ns (1 allocation: 160 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime reshape(collect(Iterators.flatten($c)), 3, 4)  #@Benoît Legat
#  210.752 ns (5 allocations: 576 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime hcat($c...)  #@spencerlyon2
#  453.213 ns (5 allocations: 384 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4
Guildroy answered 13/9, 2022 at 7:5 Comment(1)
Thanks, this new function is exactly what I wanted! Benchmarks are also very interesting since basically anything with more than 5 allocations (4 columns and 1 result) is "bad". Personally, I find the rather good performance of reduce to be quite suprising, too (not sure how well these examples generalise to bigger data; SArray Columns and MMatrix result also gave a huge performance boost in my tests, but potentially only because the vectors are so small)Chasseur
N
9

Have you tried hcat?:

julia> column(j) = [j, j ,j]
column (generic function with 1 method)

julia> my_matrix  = hcat([column(j) for j=1:4]...)
3x4 Array{Int64,2}:
 1  2  3  4
 1  2  3  4
 1  2  3  4

See hcat in Julia docs

Nealson answered 27/4, 2015 at 14:23 Comment(3)
This is perfect, thanks! I knew about the *cat functions, but I didn't know about the array => unpacked tuple conversion via ellipsis. Advice for other Julia newbies: At least consider looking at the latest version of the docs (even if using an older version of Julia), there are some drastic general improvements (e.g. THIS).Chasseur
Great, glad it works for you. Yes, the manual changes quite rapidly so looking at the latest version is often a good idea.Nealson
What if the second dimension is large? I get segfaults or stackoverflows when j runs to values larger than 2 millions, or so.Mailbox
O
4

But I found no way to convert or reshape this into the form I want (matrix with two dimensions instead of array of arrays).

You can obtain the matrix by reshaping the flatten'ed version of the vector of columns c:

julia> column(j) = [j, j, j]
column (generic function with 1 method)

julia> c = [column(j) for j in 1:4]
4-element Vector{Vector{Int64}}:
 [1, 1, 1]
 [2, 2, 2]
 [3, 3, 3]
 [4, 4, 4]

julia> v = collect(Iterators.flatten(c))
12-element Vector{Int64}:
 1
 1
 1
 2
 2
 2
 3
 3
 3
 4
 4
 4

julia> m = reshape(v, 3, 4)
3×4 Matrix{Int64}:
 1  2  3  4
 1  2  3  4
 1  2  3  4

The alternative hcat(c...) suggested by the accepted answer works as well but as pointed out by a comment, this will not work for a large number of columns. Indeed, if c is a vector of n columns then hcat(c...) will compile a method with n arguments. This first means that if your code does this for different number of columns then you will need to compile a new method every time. Secondly, compiling a method with too many arguments will make the stack memory overflow.

For this reason, when you don't need a one-line solution (e.g. like the one combining flatten and reshape above), it seems to me that most Julia users do:

m = Matrix{Int}(undef, 3, 4)
for j in 1:4
    m[:, j] = column(j)
end

EDIT @Przemyslaw suggested a better solution: https://mcmap.net/q/1865522/-how-to-construct-matrix-from-row-column-vectors-in-julia

Ockeghem answered 16/8, 2021 at 20:30 Comment(2)
Your multiline solution does exactly what I want. It's a pity that there is nothing terse with the exact same semantics (i.e. no redundant assignments to matrix and no unnecessary evaluations of the column expression). I decided against accepting your answer because the original already solved my problem... Would give you bounty but it seems I can't.Chasseur
Thanks for replying (even 6 years later). I agree that it's a pity there is no one line for that. I see the ... in many packages so I wanted to write an alternative solution here to try to spread the word so that people realize the issue with that approach.Bookplate
E
4

Looking at answer @Benoit it is actually faster to use reduce(hcat, c):

julia> c = [[j,j,j] for j in 1:4]
4-element Vector{Vector{Int64}}:
 [1, 1, 1]
 [2, 2, 2]
 [3, 3, 3]
 [4, 4, 4]


julia> @btime reshape(collect(Iterators.flatten($c)), 3, 4)
  232.791 ns (6 allocations: 448 bytes)
3×4 Matrix{Int64}:
 1  2  3  4
 1  2  3  4
 1  2  3  4

julia> @btime reduce(hcat, $c)
  70.825 ns (1 allocation: 176 bytes)
3×4 Matrix{Int64}:
 1  2  3  4
 1  2  3  4
 1  2  3  4
Empoison answered 29/8, 2021 at 19:6 Comment(2)
Indeed, I didn't know but your solution is actually what should be done. It is redirected to _typed_hcat and then _typed_hcat is a specialized implementation of exactly what we want.Bookplate
Somehow reduce is faster in surprising many use cases so it is always worth trying. Still your point about three dots ... is supper important because it is just overused here and there degrading the performance of many libraries :-)Empoison
G
4

Since Julia 1.9.0 stack can be used to combine a collection of arrays (or other iterable objects) of equal size into one larger array, by arranging them along one or more new dimensions so it can be used to construct a matrix from row/column vectors (returned from function/expression). For versions before 1.9.0 it can be activated with using Compat.

column(j) = [j, j, j]

stack(column(j) for j=1:4)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

What is fast and memory efficient.

using BenchmarkTools
column(j) = [j, j, j]

@btime stack(column(j) for j=1:4)
#  193.315 ns (5 allocations: 480 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime begin  #@Benoît Legat
    m = Matrix{Int}(undef, 3, 4)
    for j in 1:4
        m[:, j] = column(j)
    end
    m
end
#  201.942 ns (5 allocations: 480 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime reduce(hcat, [column(j) for j=1:4])  #@Przemyslaw Szufel
#  249.676 ns (6 allocations: 560 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime reshape(collect(Iterators.flatten([column(j) for j in 1:4])), 3, 4)  #@Benoît Legat
#  392.325 ns (10 allocations: 976 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime Int[column(j)[i] for i in 1:3, j in 1:4]  #@radioflash
#  440.211 ns (13 allocations: 1.09 KiB)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime hcat([column(j) for j=1:4]...)  #@spencerlyon2
#  644.676 ns (10 allocations: 784 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

In case you have already a vector.

c = [[j,j,j] for j in 1:4]
#4-element Vector{Vector{Int64}}:
# [1, 1, 1]
# [2, 2, 2]
# [3, 3, 3]
# [4, 4, 4]

stack(c)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

using BenchmarkTools

@btime stack($c)
#  63.204 ns (1 allocation: 160 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime reduce(hcat, $c)  #@Przemyslaw Szufel
#  68.758 ns (1 allocation: 160 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime begin  #@Benoît Legat
    m = Matrix{Int}(undef, 3, 4)
    for j in 1:4
        m[:, j] = $c[j]
    end
    m
end
#  75.586 ns (1 allocation: 160 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime reshape(collect(Iterators.flatten($c)), 3, 4)  #@Benoît Legat
#  210.752 ns (5 allocations: 576 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4

@btime hcat($c...)  #@spencerlyon2
#  453.213 ns (5 allocations: 384 bytes)
#3×4 Matrix{Int64}:
# 1  2  3  4
# 1  2  3  4
# 1  2  3  4
Guildroy answered 13/9, 2022 at 7:5 Comment(1)
Thanks, this new function is exactly what I wanted! Benchmarks are also very interesting since basically anything with more than 5 allocations (4 columns and 1 result) is "bad". Personally, I find the rather good performance of reduce to be quite suprising, too (not sure how well these examples generalise to bigger data; SArray Columns and MMatrix result also gave a huge performance boost in my tests, but potentially only because the vectors are so small)Chasseur

© 2022 - 2024 — McMap. All rights reserved.