Fast way to convert upper triangular matrix into symmetric matrix

M

4

9

I have an upper-triangular matrix of np.float64 values, like this:

array([[ 1.,  2.,  3.,  4.],
       [ 0.,  5.,  6.,  7.],
       [ 0.,  0.,  8.,  9.],
       [ 0.,  0.,  0., 10.]])

I would like to convert this into the corresponding symmetric matrix, like this:

array([[ 1.,  2.,  3.,  4.],
       [ 2.,  5.,  6.,  7.],
       [ 3.,  6.,  8.,  9.],
       [ 4.,  7.,  9., 10.]])

The conversion can be done in place, or as a new matrix. I would like it to be as fast as possible. How can I do this quickly?

Marino answered 5/11, 2019 at 19:42 Comment(2)

What is the usual problem size? Do you have lists of 2d arrays eg.(6x6) or a much simpler 3d-array (10_000x6x6)? – Mabelmabelle 6/11, 2019 at 14:37

In my case I'm currently processing a 4x4 matrix but also interested in cases up to 10x10 or so. – Marino 6/11, 2019 at 16:42

M

4

This is the fastest routine I've found so far that doesn't use Cython or a JIT like Numba. I takes about 1.6 μs on my machine to process a 4x4 array (average time over a list of 100K 4x4 arrays):

inds_cache = {}

def upper_triangular_to_symmetric(ut):
    n = ut.shape[0]
    try:
        inds = inds_cache[n]
    except KeyError:
        inds = np.tri(n, k=-1, dtype=np.bool)
        inds_cache[n] = inds
    ut[inds] = ut.T[inds]

Here are some other things I've tried that are not as fast:

The above code, but without the cache. Takes about 8.3 μs per 4x4 array:

def upper_triangular_to_symmetric(ut):
    n = ut.shape[0]
    inds = np.tri(n, k=-1, dtype=np.bool)
    ut[inds] = ut.T[inds]

A plain Python nested loop. Takes about 2.5 μs per 4x4 array:

def upper_triangular_to_symmetric(ut):
    n = ut.shape[0]
    for r in range(1, n):
        for c in range(r):
            ut[r, c] = ut[c, r]

Floating point addition using np.triu. Takes about 11.9 μs per 4x4 array:

def upper_triangular_to_symmetric(ut):
    ut += np.triu(ut, k=1).T

Numba version of Python nested loop. This was the fastest thing I found (about 0.4 μs per 4x4 array), and was what I ended up using in production, at least until I started running into issues with Numba and had to revert back to a pure Python version:

import numba

@numba.njit()
def upper_triangular_to_symmetric(ut):
    n = ut.shape[0]
    for r in range(1, n):
        for c in range(r):
            ut[r, c] = ut[c, r]

Cython version of Python nested loop. I'm new to Cython so this may not be fully optimized. Since Cython adds operational overhead, I'm interested in hearing both Cython and pure-Numpy answers. Takes about 0.6 μs per 4x4 array:

cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def upper_triangular_to_symmetric(np.ndarray[np.float64_t, ndim=2] ut):
    cdef int n, r, c
    n = ut.shape[0]
    for r in range(1, n):
        for c in range(r):
            ut[r, c] = ut[c, r]

Marino answered 5/11, 2019 at 19:42 Comment(3)

What about doing ut += ut.T; ut.flat[::ut.shape[0]+1] *= 0.5? – Kape 5/11, 2019 at 20:8

@MarkDickinson that is also slow (~5.7 μs). The issue is that you're doing floating point operations (adds and multiplies), which are much slower than just copying data around. – Marino 5/11, 2019 at 20:20

@KerrickStaley I'm not sure it's the fp ops. Try and time ut+ut.T alone. It's pretty fast. At this operand size it's mostly Python overheads that slow things down. Btw. I've updated my answer. – Greengrocery 5/11, 2019 at 22:1

G

6

np.where seems quite fast in the out-of-place, no-cache scenario:

np.where(ut,ut,ut.T)

On my laptop:

timeit(lambda:np.where(ut,ut,ut.T))
# 1.909718865994364

If you have pythran installed you can speed this up 3 times with near zero effort. But note that as far as I know pythran (currently) only understands contguous arrays.

file <upp2sym.py>, compile with pythran -O3 upp2sym.py

import numpy as np

#pythran export upp2sym(float[:,:])

def upp2sym(a):
    return np.where(a,a,a.T)

Timing:

from upp2sym import *

timeit(lambda:upp2sym(ut))
# 0.5760842661838979

This is almost as fast as looping:

#pythran export upp2sym_loop(float[:,:])

def upp2sym_loop(a):
    out = np.empty_like(a)
    for i in range(len(a)):
        out[i,i] = a[i,i]
        for j in range(i):
            out[i,j] = out[j,i] = a[j,i]
    return out

Timing:

timeit(lambda:upp2sym_loop(ut))
# 0.4794591029640287

We can also do it inplace:

#pythran export upp2sym_inplace(float[:,:])

def upp2sym_inplace(a):
    for i in range(len(a)):
        for j in range(i):
            a[i,j] = a[j,i]

Timing

timeit(lambda:upp2sym_inplace(ut))
# 0.28711927914991975

Greengrocery answered 5/11, 2019 at 20:23 Comment(2)

This is pretty good, 1.8 μs for a 4x4 array on my machine. Still a little slower than my fastest code, but appreciably simpler. – Marino 5/11, 2019 at 20:26

(Note that the above comment is for the plain Python implementation of np.where(ut,ut,ut.T)) – Marino 5/11, 2019 at 22:30

M

4

This is the fastest routine I've found so far that doesn't use Cython or a JIT like Numba. I takes about 1.6 μs on my machine to process a 4x4 array (average time over a list of 100K 4x4 arrays):

inds_cache = {}

def upper_triangular_to_symmetric(ut):
    n = ut.shape[0]
    try:
        inds = inds_cache[n]
    except KeyError:
        inds = np.tri(n, k=-1, dtype=np.bool)
        inds_cache[n] = inds
    ut[inds] = ut.T[inds]

Here are some other things I've tried that are not as fast:

The above code, but without the cache. Takes about 8.3 μs per 4x4 array:

def upper_triangular_to_symmetric(ut):
    n = ut.shape[0]
    inds = np.tri(n, k=-1, dtype=np.bool)
    ut[inds] = ut.T[inds]

A plain Python nested loop. Takes about 2.5 μs per 4x4 array:

def upper_triangular_to_symmetric(ut):
    n = ut.shape[0]
    for r in range(1, n):
        for c in range(r):
            ut[r, c] = ut[c, r]

Floating point addition using np.triu. Takes about 11.9 μs per 4x4 array:

def upper_triangular_to_symmetric(ut):
    ut += np.triu(ut, k=1).T

Numba version of Python nested loop. This was the fastest thing I found (about 0.4 μs per 4x4 array), and was what I ended up using in production, at least until I started running into issues with Numba and had to revert back to a pure Python version:

import numba

@numba.njit()
def upper_triangular_to_symmetric(ut):
    n = ut.shape[0]
    for r in range(1, n):
        for c in range(r):
            ut[r, c] = ut[c, r]

Cython version of Python nested loop. I'm new to Cython so this may not be fully optimized. Since Cython adds operational overhead, I'm interested in hearing both Cython and pure-Numpy answers. Takes about 0.6 μs per 4x4 array:

cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def upper_triangular_to_symmetric(np.ndarray[np.float64_t, ndim=2] ut):
    cdef int n, r, c
    n = ut.shape[0]
    for r in range(1, n):
        for c in range(r):
            ut[r, c] = ut[c, r]

Marino answered 5/11, 2019 at 19:42 Comment(3)

What about doing ut += ut.T; ut.flat[::ut.shape[0]+1] *= 0.5? – Kape 5/11, 2019 at 20:8

@MarkDickinson that is also slow (~5.7 μs). The issue is that you're doing floating point operations (adds and multiplies), which are much slower than just copying data around. – Marino 5/11, 2019 at 20:20

@KerrickStaley I'm not sure it's the fp ops. Try and time ut+ut.T alone. It's pretty fast. At this operand size it's mostly Python overheads that slow things down. Btw. I've updated my answer. – Greengrocery 5/11, 2019 at 22:1

M

2

You are mainly measuring function call overhead on such tiny problems

Another way to do that would be to use Numba. Let's start with a implementation for only one (4x4) array.

Only one 4x4 array

import numpy as np
import numba as nb

@nb.njit()
def sym(A):
    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            A[j,i]=A[i,j]
    return A


A=np.array([[ 1.,  2.,  3.,  4.],
       [ 0.,  5.,  6.,  7.],
       [ 0.,  0.,  8.,  9.],
       [ 0.,  0.,  0., 10.]])

%timeit sym(A)
#277 ns ± 5.21 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Larger example

@nb.njit(parallel=False)
def sym_3d(A):
    for i in nb.prange(A.shape[0]):
        for j in range(A.shape[1]):
            for k in range(A.shape[2]):
                A[i,k,j]=A[i,j,k]
    return A

A=np.random.rand(1_000_000,4,4)

%timeit sym_3d(A)
#13.8 ms ± 49.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#13.8 ns per 4x4 submatrix

Mabelmabelle answered 6/11, 2019 at 15:12 Comment(5)

Nice! sym gets around 0.5 µs / array on my machine. You can make it sub-0.4 µs by only processing the indices that you need to, instead of every index in the array (so apply numba.njit() to the "plain Python nested loop" version from my code). – Marino 6/11, 2019 at 16:44

@KerrickStaley 0.5µs looks extremely slow (2x slower than my measurement). How do you get this timings? I also don't really get any differences with the method you proposed. Even if I comment all useful code out (directly return in the second line) it takes 248ns vs.277ns. – Mabelmabelle 6/11, 2019 at 16:52

I have a list of 1 million 4x4 matrices and I'm timing the for loop "for ut in inputs: upper_triangular_to_symmetric(ut)" in a Jupyter notebook and then dividing by 1 million. When I change the implementation of upper_triangular_to_symmetric to a no-op, I get 0.1 µs, so it's clearly not all function overhead. – Marino 6/11, 2019 at 17:6

@KerrickStaley If you have list of all 4x4 sized arrays, you can convert that to a 3D array and work in a vectorized manner with the mask based solution. Should be straight-forward. – Knew 6/11, 2019 at 19:48

@Knew that's not actually how my code works in production, that's just the synthetic benchmark I'm using to compare different approaches. – Marino 7/11, 2019 at 1:55

H

0

import numpy as np

matrix = upper_triangular_matrix = np.array([[1.,  2.,  3.,  4.],
                     [0.,  5.,  6.,  7.],
                     [0.,  0.,  8.,  9.],
                     [0.,  0.,  0., 10.]])
print(matrix)
'''
[[ 1.  2.  3.  4.]
 [ 0.  5.  6.  7.]
 [ 0.  0.  8.  9.]
 [ 0.  0.  0. 10.]]
'''
'''
Below code, Effectively duplicates the upper triangular part into the lower triangular part, 
resulting in a matrix that is almost symmetric, except for the diagonal elements.
'''
symmetric_matrix = matrix  + matrix.T 
print(symmetric_matrix)

'''
[[ 2.  2.  3.  4.]
 [ 2. 10.  6.  7.]
 [ 3.  6. 16.  9.]
 [ 4.  7.  9. 20.]]
'''
#change the diagonal to the Original matrix 
np.fill_diagonal(symmetric_matrix,np.diag(matrix))
print(symmetric_matrix)
'''
[[ 1.  2.  3.  4.]
 [ 2.  5.  6.  7.]
 [ 3.  6.  8.  9.]
 [ 4.  7.  9. 10.]]
'''

Hemiplegia answered 25/7, 2024 at 9:28 Comment(0)

You are mainly measuring function call overhead on such tiny problems

Recommended topics

Hot tags