I have a NumPy array 'boolarr' of boolean type. I want to count the number of elements whose values are True
. Is there a NumPy or Python routine dedicated for this task? Or, do I need to iterate over the elements in my script?
You have multiple options. Two options are the following.
boolarr.sum()
numpy.count_nonzero(boolarr)
Here's an example:
>>> import numpy as np
>>> boolarr = np.array([[0, 0, 1], [1, 0, 1], [1, 0, 1]], dtype=np.bool)
>>> boolarr
array([[False, False, True],
[ True, False, True],
[ True, False, True]], dtype=bool)
>>> boolarr.sum()
5
Of course, that is a bool
-specific answer. More generally, you can use numpy.count_nonzero
.
>>> np.count_nonzero(boolarr)
5
bool
: boolean values are treated as 1 and 0 in arithmetic operations. See "Boolean Values" in the Python Standard Library documentation. Note that NumPy's bool
and Python bool
are not the same, but they are compatible (see here for more information). –
Avow numpy.count_nonzero
not being in NumPy v1.5.1: you are right. According to this release announcement, it was added in NumPy v1.6.0. –
Avow numpy.count_nonzero
is about a thousand times faster, in my Python interpreter, at least. python -m timeit -s "import numpy as np; bools = np.random.uniform(size=1000) >= 0.5" "np.count_nonzero(bools)"
vs. python -m timeit -s "import numpy as np; bools = np.random.uniform(size=1000) >= 0.5" "sum(bools)"
–
Homs np.sum(bools)
instead! However, np.count_nonzero(bools)
is still ~12x faster. –
Milkmaid np.any(bools)
–
Lynx numpy.count_nonzero
returns wrong results for masked array. If masked array has some mask values and all True values in other cells, it's different from np.sum()
. Is it bug or expected result? –
Standley That question solved a quite similar question for me and I thought I should share :
In raw python you can use sum()
to count True
values in a list
:
>>> sum([True,True,True,False,False])
3
But this won't work :
>>> sum([[False, False, True], [True, False, True]])
TypeError...
sum
is much slower for Pandas DataFrame
s and numpy arrays than their respective sum
methods. –
Nicknack In terms of comparing two numpy arrays and counting the number of matches (e.g. correct class prediction in machine learning), I found the below example for two dimensions useful:
import numpy as np
result = np.random.randint(3,size=(5,2)) # 5x2 random integer array
target = np.random.randint(3,size=(5,2)) # 5x2 random integer array
res = np.equal(result,target)
print result
print target
print np.sum(res[:,0])
print np.sum(res[:,1])
which can be extended to D dimensions.
The results are:
Prediction:
[[1 2]
[2 0]
[2 0]
[1 2]
[1 2]]
Target:
[[0 1]
[1 0]
[2 0]
[0 0]
[2 1]]
Count of correct prediction for D=1: 1
Count of correct prediction for D=2: 2
b[b].size
where b
is the Boolean ndarray in question. It filters b
for True
, and then count the length of the filtered array.
This probably isn't as efficient np.count_nonzero()
mentioned previously, but is useful if you forget the other syntax. Plus, this shorter syntax saves programmer time.
Demo:
In [1]: a = np.array([0,1,3])
In [2]: a
Out[2]: array([0, 1, 3])
In [3]: a[a>=1].size
Out[3]: 2
In [5]: b=a>=1
In [6]: b
Out[6]: array([False, True, True])
In [7]: b[b].size
Out[7]: 2
boolarr.sum(axis=1 or axis=0)
axis = 1 will output number of trues in a row and axis = 0 will count number of trues in columns so
boolarr[[true,true,true],[false,false,true]]
print(boolarr.sum(axis=1))
will be (3,1)
For 1D array, this is what worked for me:
import numpy as np
numbers= np.array([3, 1, 5, 2, 5, 1, 1, 5, 1, 4, 2, 1, 4, 5, 3, 4,
5, 2, 4, 2, 6, 6, 3, 6, 2, 3, 5, 6, 5])
numbersGreaterThan2= np.count_nonzero(numbers> 2)
© 2022 - 2024 — McMap. All rights reserved.