I have a numpy array containing 10^8 floats and want to count how many of them are >= a given threshold. Speed is crucial because the operation has to be done on large numbers of such arrays. The contestants so far are
np.sum(myarray >= thresh)
np.size(np.where(np.reshape(myarray,-1) >= thresh))
The answers at Count all values in a matrix greater than a value suggest that np.where() would be faster, but I've found inconsistent timing results. What I mean by this is for some realizations and Boolean conditions np.size(np.where(cond)) is faster than np.sum(cond), but for some it is slower.
Specifically, if a large fraction of entries fulfil the condition then np.sum(cond) is significantly faster but if a small fraction (maybe less than a tenth) do then np.size(np.where(cond)) wins.
The question breaks down into 2 parts:
- Any other suggestions?
- Does it make sense that the time taken by np.size(np.where(cond)) increases with the number of entries for which cond is true?