Calculating the outer product for a sequence of numpy ndarrays [duplicate]

Asked 4/12, 2017 at 0:38 Answered 4/12, 2017 at 1:27

Solved python performance numpy matrix scientific-computing

I have a list of 3D points p stored in an ndarray with shape (N, 3). I want to compute the outer product for each 3d point with itself:

N = int(1e4)
p = np.random.random((N, 3))
result = np.zeros((N, 3, 3))
for i in range(N):
    result[i, :, :] = np.outer(p[i, :], p[i, :])

Is there a way to compute this outer product without any python-level loops? The problem is that np.outer does not support anything like an axis argument.

Goldplate answered 4/12, 2017 at 0:38 Comment(0)

You can use broadcasting:

p[..., None] * p[:, None, :]

This syntax inserts an axis at the end of the first term (making it Nx3x1) and the middle of the second term (making it Nx1x3). These are then broadcast and yield an Nx3x3 result.

Gaylord answered 4/12, 2017 at 0:59 Comment(6)

Wow, elegant solution. Thanks! – Goldplate 4/12, 2017 at 1:5

Can you help me figure out why my solution works? It's even faster, but I don't understand the concept of broadcasting in einsum. – Schultz 4/12, 2017 at 1:30

the dots tell einsum you are only interested in the rightmost axes and want the rest left alone. i and j refer to the last axes; that they are different tells einsum to create separate axes for them in the output, so in short: nothing or ellipsis -> normal broadcasting; same letter in both arguments -> reduction along this axis; letter occurs in only one term -> separate axis in output – Gaylord 4/12, 2017 at 1:51

@PaulPanzer Thank you. Is there any way to not use the dots? There should be, because the amount of dimensions is fixed, right? – Schultz 4/12, 2017 at 4:46

@Sebastian You can use 'ij,ik->ijk' instead. What do you mean by the amount of dimensions is fixed? – Gaylord 4/12, 2017 at 9:40

I meant each array p and p only have two dimensions, and the ellipsis notation is meant for applying einsum to arrays which do not always have the same dimensions. For example, my function works with arrays of one dimension (which is identical to np.outer) and 3 dimensions (which I have no comparison to). – Schultz 4/12, 2017 at 17:41

A much better solution than my previous one is using np.einsum:

np.einsum('...i,...j', p, p)

which is even faster than the broadcasting approach:

In [ ]: %timeit p[..., None] * p[:, None, :]
514 µs ± 4.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [ ]: %timeit np.einsum('...i,...j', p, p)
169 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

As for how it works I'm not quite sure, I just messed around with einsum until I got the answer I wanted:

In [ ]: np.all(np.einsum('...i,...j', p, p) == p[..., None] * p[:, None, :])
Out[ ]: True

Schultz answered 4/12, 2017 at 1:27 Comment(6)

It seems that whenever asking "How can I do <some numpy array operation>?" there's always an answer involving einsum that is faster than any other approach. I can never understand the documentation well enough but as far as I'm concerned it's a magic box that solves every array operation and manipulation routine provided someone else tells you how on SO. – Chirography 4/12, 2017 at 3:1

@AlexanderReynolds I have no clue how it works either. All I know is that if you're only adding or multiplying parts of arrays, einsum will do anything other numpy functions can't. – Schultz 4/12, 2017 at 4:47

It's worth digging into that einsum documentation, espeically if you have any prior knowledge of Einstein summation notation, although often Divakar will come along with something even faster using @ or np.dot – Scissile 4/12, 2017 at 7:51

Basic context of Einstein notation is to muliply over indices that are repeated, and sum over indices lost. So or example 'ij, jk -> ij' multiplies the j dimensions, and then sums over the k dimension - while 'ij, jk - > ij, jk -> ik` multiples and sums over the j dimensions. – Scissile 4/12, 2017 at 8:0

@DanielF although often Divakar will come along with something even faster using @ or np.dot ... or tensordot. – Gaylord 4/12, 2017 at 9:50

@PaulPanzer Or just, y'know, broadcasting – Scissile 4/12, 2017 at 11:56

You could at least use apply_along_axis:

result = np.apply_along_axis(lambda point: np.outer(point, point), 1, p)

Surprisingly, however, this is in fact slower than your method:

In [ ]: %%timeit N = int(1e4); p = np.random.random((N, 3))
   ...: result = np.apply_along_axis(lambda point: np.outer(point, point), 1, p)
61.5 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [ ]: %%timeit N = int(1e4); p = np.random.random((N, 3))
   ...: result = np.zeros((N, 3, 3))
   ...: for i in range(N):
   ...:     result[i, :, :] = np.outer(p[i, :], p[i, :])
46 ms ± 709 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Schultz answered 4/12, 2017 at 0:53 Comment(3)

That just hides the python level loop. – Showker 4/12, 2017 at 0:56

Essentially, yeah. In fact, it's even slower. See my edit. – Schultz 4/12, 2017 at 0:58

Yes, I can confirm with independent timing tests. I did the same test using a list comprehension and it's the same. These loops are being done at the python level. – Goldplate 4/12, 2017 at 1:2

Recommended topics

Hot tags