Calculating the outer product for a sequence of numpy ndarrays [duplicate]
Asked Answered
G

3

2

I have a list of 3D points p stored in an ndarray with shape (N, 3). I want to compute the outer product for each 3d point with itself:

N = int(1e4)
p = np.random.random((N, 3))
result = np.zeros((N, 3, 3))
for i in range(N):
    result[i, :, :] = np.outer(p[i, :], p[i, :])

Is there a way to compute this outer product without any python-level loops? The problem is that np.outer does not support anything like an axis argument.

Goldplate answered 4/12, 2017 at 0:38 Comment(0)
G
7

You can use broadcasting:

p[..., None] * p[:, None, :]

This syntax inserts an axis at the end of the first term (making it Nx3x1) and the middle of the second term (making it Nx1x3). These are then broadcast and yield an Nx3x3 result.

Gaylord answered 4/12, 2017 at 0:59 Comment(6)
Wow, elegant solution. Thanks!Goldplate
Can you help me figure out why my solution works? It's even faster, but I don't understand the concept of broadcasting in einsum.Schultz
the dots tell einsum you are only interested in the rightmost axes and want the rest left alone. i and j refer to the last axes; that they are different tells einsum to create separate axes for them in the output, so in short: nothing or ellipsis -> normal broadcasting; same letter in both arguments -> reduction along this axis; letter occurs in only one term -> separate axis in outputGaylord
@PaulPanzer Thank you. Is there any way to not use the dots? There should be, because the amount of dimensions is fixed, right?Schultz
@Sebastian You can use 'ij,ik->ijk' instead. What do you mean by the amount of dimensions is fixed?Gaylord
I meant each array p and p only have two dimensions, and the ellipsis notation is meant for applying einsum to arrays which do not always have the same dimensions. For example, my function works with arrays of one dimension (which is identical to np.outer) and 3 dimensions (which I have no comparison to).Schultz
S
2

A much better solution than my previous one is using np.einsum:

np.einsum('...i,...j', p, p)

which is even faster than the broadcasting approach:

In [ ]: %timeit p[..., None] * p[:, None, :]
514 µs ± 4.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [ ]: %timeit np.einsum('...i,...j', p, p)
169 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

As for how it works I'm not quite sure, I just messed around with einsum until I got the answer I wanted:

In [ ]: np.all(np.einsum('...i,...j', p, p) == p[..., None] * p[:, None, :])
Out[ ]: True
Schultz answered 4/12, 2017 at 1:27 Comment(6)
It seems that whenever asking "How can I do <some numpy array operation>?" there's always an answer involving einsum that is faster than any other approach. I can never understand the documentation well enough but as far as I'm concerned it's a magic box that solves every array operation and manipulation routine provided someone else tells you how on SO.Chirography
@AlexanderReynolds I have no clue how it works either. All I know is that if you're only adding or multiplying parts of arrays, einsum will do anything other numpy functions can't.Schultz
It's worth digging into that einsum documentation, espeically if you have any prior knowledge of Einstein summation notation, although often Divakar will come along with something even faster using @ or np.dotScissile
Basic context of Einstein notation is to muliply over indices that are repeated, and sum over indices lost. So or example 'ij, jk -> ij' multiplies the j dimensions, and then sums over the k dimension - while 'ij, jk - > ij, jk -> ik` multiples and sums over the j dimensions.Scissile
@DanielF although often Divakar will come along with something even faster using @ or np.dot ... or tensordot.Gaylord
@PaulPanzer Or just, y'know, broadcastingScissile
S
0

You could at least use apply_along_axis:

result = np.apply_along_axis(lambda point: np.outer(point, point), 1, p)

Surprisingly, however, this is in fact slower than your method:

In [ ]: %%timeit N = int(1e4); p = np.random.random((N, 3))
   ...: result = np.apply_along_axis(lambda point: np.outer(point, point), 1, p)
61.5 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [ ]: %%timeit N = int(1e4); p = np.random.random((N, 3))
   ...: result = np.zeros((N, 3, 3))
   ...: for i in range(N):
   ...:     result[i, :, :] = np.outer(p[i, :], p[i, :])
46 ms ± 709 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Schultz answered 4/12, 2017 at 0:53 Comment(3)
That just hides the python level loop.Showker
Essentially, yeah. In fact, it's even slower. See my edit.Schultz
Yes, I can confirm with independent timing tests. I did the same test using a list comprehension and it's the same. These loops are being done at the python level.Goldplate

© 2022 - 2024 — McMap. All rights reserved.