Fastest way to count array values above a threshold in numpy

Asked 26/1, 2014 at 18:41 Answered 22/5, 2024 at 23:9

I have a numpy array containing 10^8 floats and want to count how many of them are >= a given threshold. Speed is crucial because the operation has to be done on large numbers of such arrays. The contestants so far are

np.sum(myarray >= thresh)

np.size(np.where(np.reshape(myarray,-1) >= thresh))

The answers at Count all values in a matrix greater than a value suggest that np.where() would be faster, but I've found inconsistent timing results. What I mean by this is for some realizations and Boolean conditions np.size(np.where(cond)) is faster than np.sum(cond), but for some it is slower.

Specifically, if a large fraction of entries fulfil the condition then np.sum(cond) is significantly faster but if a small fraction (maybe less than a tenth) do then np.size(np.where(cond)) wins.

The question breaks down into 2 parts:

Any other suggestions?
Does it make sense that the time taken by np.size(np.where(cond)) increases with the number of entries for which cond is true?

Epiphytotic answered 26/1, 2014 at 18:41 Comment(2)

numexpr or numba could probably speed things up by avoiding the creation of an intermediate array. – Overly 26/1, 2014 at 20:1

there is also np.count_nonzero which is much faster than boolean sum on new numpy versions. – Hiss 26/1, 2014 at 22:5

Using cython might be a decent alternative.

import numpy as np
cimport numpy as np
cimport cython
from cython.parallel import prange


DTYPE_f64 = np.float64
ctypedef np.float64_t DTYPE_f64_t


@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
cdef int count_above_cython(DTYPE_f64_t [:] arr_view, DTYPE_f64_t thresh) nogil:

    cdef int length, i, total
    total = 0
    length = arr_view.shape[0]

    for i in prange(length):
        if arr_view[i] >= thresh:
            total += 1

    return total


@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def count_above(np.ndarray arr, DTYPE_f64_t thresh):

    cdef DTYPE_f64_t [:] arr_view = arr.ravel()
    cdef int total

    with nogil:
       total =  count_above_cython(arr_view, thresh)
    return total

Timing of different proposed methods.

myarr = np.random.random((1000,1000))
thresh = 0.33

In [6]: %timeit count_above(myarr, thresh)
1000 loops, best of 3: 693 µs per loop

In [9]: %timeit np.count_nonzero(myarr >= thresh)
100 loops, best of 3: 4.45 ms per loop

In [11]: %timeit np.sum(myarr >= thresh)
100 loops, best of 3: 4.86 ms per loop

In [12]: %timeit np.size(np.where(np.reshape(myarr,-1) >= thresh))
10 loops, best of 3: 61.6 ms per loop

With a larger array:

In [13]: myarr = np.random.random(10**8)

In [14]: %timeit count_above(myarr, thresh)
10 loops, best of 3: 63.4 ms per loop

In [15]: %timeit np.count_nonzero(myarr >= thresh)
1 loops, best of 3: 473 ms per loop

In [16]: %timeit np.sum(myarr >= thresh)
1 loops, best of 3: 511 ms per loop

In [17]: %timeit np.size(np.where(np.reshape(myarr,-1) >= thresh))
1 loops, best of 3: 6.07 s per loop

Oceanic answered 26/1, 2014 at 22:56 Comment(1)

I guess it will depend on hardware, and in cython you can easier parallelize. With -O3 in cython (without its slow) and development numpy, on my computer they perform close to identical (slight edge for cython but the numpy code is much faster with uncontiguous arrays, though you can fix that of course). However you should really use ssize_t/np.intp_t and not int, that is a bug otherwise. – Hiss 27/1, 2014 at 0:13

You can also use an iterator or something like this:

len(array[array>thresh])

You'll get a decent runtime. I didn't compare to cython, but it will be faster than the original proposal.

Shul answered 22/5, 2024 at 23:9 Comment(1)

It would be good if you can compare your results with previous ones so that your answer becomes more useful. – Mischievous 23/5, 2024 at 5:40

Recommended topics

Hot tags