Is there a better way to broadcast arrays?
Asked Answered
B

4

6

I want to broadcast an array b to the shape it would take if it were in an arithmetic operation with another array a.

For example, if a.shape = (3,3) and b was a scalar, I want to get an array whose shape is (3,3) and is filled with the scalar.

One way to do this is like this:

>>> import numpy as np
>>> a = np.arange(9).reshape((3,3))
>>> b = 1 + a*0
>>> b
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

Although this works practically, I can't help but feel it looks a bit weird, and wouldn't be obvious to someone else looking at the code what I was trying to do.

Is there any more elegant way to do this? I've looked at the documentation for np.broadcast, but it's orders of magnitude slower.

In [1]: a = np.arange(10000).reshape((100,100))

In [2]: %timeit 1 + a*0
10000 loops, best of 3: 31.9 us per loop

In [3]: %timeit bc = np.broadcast(a,1);np.fromiter((v for u, v in bc),float).reshape(bc.shape)
100 loops, best of 3: 5.2 ms per loop

In [4]: 5.2e-3/32e-6
Out[4]: 162.5
Barnstorm answered 24/7, 2012 at 0:57 Comment(0)
P
7

If you just want to fill an array with a scalar, fill is probably the best choice. But it sounds like you want something more generalized. Rather than using broadcast you can use broadcast_arrays to get the result that (I think) you want.

>>> a = numpy.arange(9).reshape(3, 3)
>>> numpy.broadcast_arrays(a, 1)[1]
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

This generalizes to any two broadcastable shapes:

>>> numpy.broadcast_arrays(a, [1, 2, 3])[1]
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

It's not quite as fast as your ufunc-based method, but it's still on the same order of magnitude:

>>> %timeit 1 + a * 0
10000 loops, best of 3: 23.2 us per loop
>>> %timeit numpy.broadcast_arrays(a, 1)[1]
10000 loops, best of 3: 52.3 us per loop

But scalars, fill is still the clear front-runner:

>>> %timeit b = numpy.empty_like(a, dtype='i8'); b.fill(1)
100000 loops, best of 3: 6.59 us per loop

Finally, further testing shows that the fastest approach -- in at least some cases -- is to multiply by ones:

>>> %timeit numpy.broadcast_arrays(a, numpy.arange(100))[1]
10000 loops, best of 3: 53.4 us per loop
>>> %timeit (1 + a * 0) * numpy.arange(100)
10000 loops, best of 3: 45.9 us per loop
>>> %timeit b = numpy.ones_like(a, dtype='i8'); b * numpy.arange(100)
10000 loops, best of 3: 28.9 us per loop
Phyllis answered 24/7, 2012 at 2:48 Comment(2)
Perfect! This is exactly what I was looking for. I'm surprised I didn't see it; it's right next to broadcast in the docs.Barnstorm
In case you're curious, the reason I'm interested in this is that the function scipy.ndimage.map_coordinates does not automatically broadcast the input coordinates, so I have to do it manually.Barnstorm
B
2

fill sounds like the simplest way:

>>> a = np.arange(9).reshape((3,3))
>>> a
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> a.fill(10)
>>> a
array([[10, 10, 10],
       [10, 10, 10],
       [10, 10, 10]])

EDIT: As @EOL points out, you don't need arange if you want to create a new array, np.empty((100,100)) (or whatever shape) is better for this.

Timings:

In [3]: a = np.arange(10000).reshape((100,100))
In [4]: %timeit 1 + a*0
100000 loops, best of 3: 19.9 us per loop

In [5]: a = np.arange(10000).reshape((100,100))
In [6]: %timeit a.fill(1)
100000 loops, best of 3: 3.73 us per loop
Berkowitz answered 24/7, 2012 at 2:45 Comment(6)
There is no reason to use arange(): this wastes time for nothing, as an array has to be created and filled with numbers that will only get erased.Hurst
@EOL, I was just taking the example in the question to create an array. It's irrelevant for this question (I was assuming the the array was already there, waiting to be filled.)Berkowitz
Yeah, arange is irrelevant to the timing going on here. I hope the downvoter removes the down vote; it's completely inappropriate IMO.Mitten
Timing is not the issue I have. My point is that a in this answer is the b in the original question (it is the final result). In the question b does not exist: it has to be created. Creating it is part of the answer, and this should not involve arange().Hurst
@EOL it feels a bit odd to be downvoted considering I was the first one to mention fill() and I've also provided timings, but losing 2 points isn't a big deal... I barely copied a = np.arange(9).reshape((3,3)) from the question because I didn't want to make assumptions regarding where it came from in the wider context of the OP. I tend only to downvote when the answer is clearly wrong, misleading or of very low quality, but I guess we all have different criteria for voting.Berkowitz
@Bruno: fill() is my favorite solution (it's fast and to the point), and you were indeed the first to mention it. However, I consider that writing a = arange(…); a.fill(…) is misleading: it can be taken as an example by newcomers, and using this code is bad practice (it does more work than necessary). However, since you edited your post, I removed my downvote. :)Hurst
H
2

The fastest and cleanest solution I know is:

b_arr = numpy.empty(a.shape)  # Empty array
b_arr.fill(b)  # Filling with one value
Hurst answered 24/7, 2012 at 2:49 Comment(0)
H
1

If you just need to broadcast a scalar to some arbitrary shape, you can do something like this:

a = b*np.ones(shape=(3,3))

Edit: np.tile is more general. You can use it to duplicate any scalar/vector in any number of dimensions:

b = 1
N = 100
a = np.tile(b, reps=(N, N))
Hypesthesia answered 24/7, 2012 at 2:7 Comment(1)
b*np.ones() involves multiplications whereas we only need to copy values. fill() is both more appropriate and faster.Hurst

© 2022 - 2024 — McMap. All rights reserved.