Let's say I have an array like this:
import numpy as np
base_array = np.array([-13, -9, -11, -3, -3, -4, 2, 2,
2, 5, 7, 7, 8, 7, 12, 11])
Suppose I want to know: "how many elements in base_array
are greater than 4?" This can be done simply by exploiting broadcasting:
np.sum(4 < base_array)
For which the answer is 7
. Now, suppose instead of comparing to a single value, I want to do this over an array. In other words, for each value c
in the comparison_array
, find out how many elements of base_array
are greater than c
. If I do this the naive way, it obviously fails because it doesn't know how to broadcast it properly:
comparison_array = np.arange(-13, 13)
comparison_result = np.sum(comparison_array < base_array)
Output:
Traceback (most recent call last):
File "<pyshell#87>", line 1, in <module>
np.sum(comparison_array < base_array)
ValueError: operands could not be broadcast together with shapes (26,) (16,)
If I could somehow have each element of comparison_array
get broadcast to base_array
's shape, that would solve this. But I don't know how to do such an "element-wise broadcasting".
Now, I do know I how to implement this for both cases using list comprehension:
first = sum([4 < i for i in base_array])
second = [sum([c < i for i in base_array])
for c in comparison_array]
print(first)
print(second)
Output:
7
[15, 15, 14, 14, 13, 13, 13, 13, 13, 12, 10, 10, 10, 10, 10, 7, 7, 7, 6, 6, 3, 2, 2, 2, 1, 0]
But as we all know, this will be orders of magnitude slower than a correctly-vectorized numpy
implementation on larger arrays. So, how should I do this in numpy
so that it's fast? Ideally this solution should extend to any kind of operation where broadcasting works, not just greater-than or less-than in this example.
comparison_array
, like what the list-comprehension gave. – Superficies