Given a numpy
array of threshold values, what is the most efficient way to produce an array of the counts of another array meeting these values?
Assume the threshold value array is small and sorted, and the array of values to be counted is large-ish and unsorted.
Example: for each element of valueLevels
, count the elements of values
greater than or equal to it:
import numpy as np
n = int(1e5) # size of example
# example levels: the sequence 0, 1., 2.5, 5., 7.5, 10, 5, ... 50000, 75000
valueLevels = np.concatenate(
[np.array([0.]),
np.concatenate([ [ x*10**y for x in [1., 2.5, 5., 7.5] ]
for y in range(5) ] )
]
)
np.random.seed(123)
values = np.random.uniform(low=0, high=1e5, size=n)
So far I have tried the list comprehension approach.
np.array([sum(values>=x) for x in valueLevels])
was unacceptably slownp.array([len(values[values>=x]) for x in valueLevels])
was an improvement- sorting
values
did speed up the comprehension (in the example, from ~7 to 0.5 ms), but the cost of sort (~8 ms) exceeded the savings for one-time use
The best I have right now is a comprehension of this approach:
%%timeit
np.array([np.count_nonzero(values>=x) for x in valueLevels])
# 1000 loops, best of 3: 1.26 ms per loop
which is acceptable for my purposes, but out of curiosity,
What I would like to know is
- If list comprehension is the way to go, can it be sped up? Or,
- Are other approaches faster? (I have a vague sense that this could be done by broadcasting the values array over the thresholds array, but I can't figure out how to get the dimensions right for
np.broadcast_arrays()
.