how to find the unique non nan values in a numpy array?
Asked Answered
S

5

24

I would like to know if there is a clean way to handle nan in numpy.

my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
print my_array1
#[  5.   4.   2.   2.   4.  nan  nan   6.]
print set(my_array1)
#set([nan, nan, 2.0, 4.0, 5.0, 6.0])

I would have thought it should return at most 1 nan value. Why does it return multiple nan values? I would like to know how many unique non nan values I have in a numpy array.

Thanks

Shovelboard answered 9/3, 2015 at 11:3 Comment(0)
M
35

You can use np.unique to find unique values in combination with isnan to filter the NaN values:

In [22]:

my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
np.unique(my_array1[~np.isnan(my_array1)])
Out[22]:
array([ 2.,  4.,  5.,  6.])

as to why you get multiple NaN values it's because NaN values cannot be compared normally:

In [23]:

np.nan == np.nan
Out[23]:
False

so you have to use isnan to perform the correct comparison

using set:

In [24]:

set(my_array1[~np.isnan(my_array1)])
Out[24]:
{2.0, 4.0, 5.0, 6.0}

You can call len on any of the above to get a size:

In [26]:

len(np.unique(my_array1[~np.isnan(my_array1)]))
Out[26]:
4
Milwaukee answered 9/3, 2015 at 11:8 Comment(1)
As of Numpy version 1.21.0, np.unique now returns single NaN.Weinhardt
T
9

I'd suggest using pandas. I think it's a direct replacement, but pandas keeps the original order unlike numpy.

import numpy as np
import pandas as pd

my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])

np.unique(my_array1)
# array([ 2.,  4.,  5.,  6., nan, nan])

pd.unique(my_array1)
# array([ 5.,  4.,  2., nan,  6.]) 

I'm using numpy 1.17.4 and pandas 0.25.3. Hope this helps!

Taneshatang answered 16/1, 2020 at 19:44 Comment(2)
That still gives you NaNs, so what's the point?Stephanistephania
Thanks for the question. It removes duplicates, and leaves only a single nan, which is useful for determining the number of unique values. Just an alternative approach. Also good in getting the unique values without loosing the order. As per the question "at most 1 nan value"Taneshatang
B
2

As previous answers have already stated, numpy can't count nans directly, because it can't compare nans. numpy.ma.count_masked is your friend. For example, like this:

>>> import numpy.ma as ma
>>> a = np.array([ 0.,  1., np.nan, np.nan,  4.])
>>> a
np.array([ 0.,  1., nan, nan,  4.])
>>> a_masked = ma.masked_invalid(a)
>>> a_masked
masked_array(data=[0.0, 1.0, --, --, 4.0],
             mask=[False, False,  True,  True, False],
       fill_value=1e+20)
>>> ma.count_masked(a_masked)
2
Buckle answered 6/11, 2019 at 10:17 Comment(0)
W
1

As of Numpy version 1.21.0, np.unique now returns single NaN:

>>> a = np.array([8, 1, np.nan, 3, np.inf, np.nan, -np.inf, -2, np.nan, 3])
>>> np.unique(a)
array([-inf,  -2.,   1.,   3.,   8.,  inf,  nan])
Weinhardt answered 15/7, 2021 at 9:39 Comment(0)
R
0

You could use isnan() with your setm then iterate through result of isnan() array and remove all NaN objects.

my_array1=np.array([5,4,2,2,4,np.nan,np.nan,6])
print my_array1
#[  5.   4.   2.   2.   4.  nan  nan   6.]
print set(my_array1)
#set([nan, nan, 2.0, 4.0, 5.0, 6.0])
for i,is_nan in enumerate(np.isnan(list(my_array1))):
    if is_nan:
        del my_array1[i]
Recollected answered 9/3, 2015 at 11:8 Comment(1)
if you want to remove all NaN elements from an array a MUCH better way is to do: my_array1 = my_array1[~np.isnan(my_array1)] It it will operate in a vectorized way (most likely using optimized code) and not iterate at python level. Not only that it's less code to write, it's also much faster for big arrays.Narcisanarcissism

© 2022 - 2024 — McMap. All rights reserved.