Numpy: use bins with infinite range
Asked Answered
A

3

9

In my Python script I have floats that I want to bin. Right now I'm doing:

min_val = 0.0
max_val = 1.0
num_bins = 20
my_bins = numpy.linspace(min_val, max_val, num_bins)
hist,my_bins = numpy.histogram(myValues, bins=my_bins)

But now I want to add two more bins to account for values that are < 0.0 and for those that are > 1.0. One bin should thus include all values in ( -inf, 0), the other one all in [1, inf)

Is there any straightforward way to do this while still using numpy's histogram function?

Account answered 24/7, 2012 at 15:10 Comment(0)
T
12

The function numpy.histogram() happily accepts infinite values in the bins argument:

numpy.histogram(my_values, bins=numpy.r_[-numpy.inf, my_bins, numpy.inf])

Alternatively, you could use a combination of numpy.searchsorted() and numpy.bincount(), though I don't see much advantage to that approach.

Tocci answered 24/7, 2012 at 15:16 Comment(1)
With matplotlib (plt), even though it uses numpy's hist internally, it does not accept the inf (drawing infinite boxes is too much? :-)). But a VERY large value (compared to the typical range of my data) worked well in my case.Hotchkiss
D
3

You can specify numpy.inf as the upper and -numpy.inf as the lower bin limits.

Disused answered 24/7, 2012 at 15:14 Comment(0)
W
0

With Numpy version 1.16 you have histogram_bin_edges. With this, todays solution calls histogram_bin_edges to get the bins, concatenate -inf and +inf and pass this as bins to histogram:

a=[1,2,3,4,2,3,4,7,4,6,7,5,4,3,2,3]
np.histogram(a, bins=np.concatenate(([np.NINF], np.histogram_bin_edges(a), [np.PINF])))

Results in:

(array([0, 1, 3, 0, 4, 0, 4, 1, 0, 1, 0, 2]),
array([-inf,  1. ,  1.6,  2.2,  2.8,  3.4,  4. ,  4.6,  5.2,  5.8,  6.4, 7. ,  inf]))

if you prefer to have the last bin empty (as I do), you can use the range parameter and add a small number to max:

a=[1,2,3,4,2,3,4,7,4,6,7,5,4,3,2,3]
np.histogram(a, bins=np.concatenate(([np.NINF], np.histogram_bin_edges(a, range=(np.min(a), np.max(a)+.1)), [np.PINF])))

Results in:

(array([0, 1, 3, 0, 4, 4, 0, 1, 0, 1, 2, 0]),
array([-inf, 1.  , 1.61, 2.22, 2.83, 3.44, 4.05, 4.66, 5.27, 5.88, 6.49, 7.1 ,  inf]))
Witkowski answered 20/3, 2019 at 8:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.