"isnotnan" functionality in numpy, can this be more pythonic?
Asked Answered
B

4

98

I need a function that returns non-NaN values from an array. Currently I am doing it this way:

>>> a = np.array([np.nan, 1, 2])
>>> a
array([ NaN,   1.,   2.])

>>> np.invert(np.isnan(a))
array([False,  True,  True], dtype=bool)

>>> a[np.invert(np.isnan(a))]
array([ 1.,  2.])

Python: 2.6.4 numpy: 1.3.0

Please share if you know a better way, Thank you

Bonacci answered 14/5, 2010 at 2:30 Comment(0)
B
198
a = a[~np.isnan(a)]
Busby answered 14/5, 2010 at 2:41 Comment(0)
M
70

You are currently testing for anything that is not NaN and mtrw has the right way to do this. If you are interested in testing for finite numbers (is not NaN and is not INF) then you don't need an inversion and can use:

np.isfinite(a)

More pythonic and native, an easy read, and often when you want to avoid NaN you also want to avoid INF in my experience.

Just thought I'd toss that out there for folks.

Monarch answered 18/11, 2013 at 16:57 Comment(7)
Note: If you want to use isnotnan for filtering pandas, this is the way to go.Predation
@CharlieHayley wouldn't pd.notnull() be a much better option for pandas?Spool
@JoshD. I checked the code and pd.notnull() is for testing objects instead of numeric values, returning negative if an object in an object array is not an instance of an object. It will be slower than np.isfinite() but is able to handle arbitrary object arrays (e.g. arrays of lists). Neat find, and a good idea if your array might include arbitrary objects. I think if you can be confident your array is generally numeric except for NaN and INF then np.isfinite would be faster, so depends on use case. Thanks for bringing that up, I don't think it was around when the answer was posted.Monarch
@EzekielKruglick if the data is already in pandas, not only is pandas actually faster, but it is more functional as well, given that it includes an index you can use to more easily join on: gist.github.com/jaypeedevlin/fdfb88f6fd1031a819f1d46cb36384daSpool
I think leave it in the comments - the original question is not about pandas.Spool
@JoshD. that's incorrect, Numpy is faster. I commented on your Gist: gist.github.com/jaypeedevlin/… . Basically, you did it wrong -- you're performing the operation on the Pandas object, rather than doing it on the ndarray. Performing the operation on the ndarray is about 25x faster.Palaeontology
@philipKahn Hmm, looks like I did make an error. I was imagining that numpy would cast to an ndarray before it did the operations, so that .values was unnecessary - live and learn!Spool
H
4

To get array([ 1., 2.]) from an array arr = np.array([np.nan, 1, 2]) You can do :

 arr[~np.isnan(arr)]

OR

arr[arr == arr] 

(While : np.nan == np.nan is False)

Hersey answered 24/10, 2020 at 8:33 Comment(0)
M
2

I'm not sure whether this is more or less pythonic...

a = [i for i in a if i is not np.nan]
Mcshane answered 9/11, 2019 at 16:49 Comment(1)
It's not appropriate for numpy arrays. Not only do you now get a list back (and thus fundamentally change the nature of the object returned) but this runs in a Python loop and will be orders of magnitude slower than a numpy method. I do not recommend this at allGamopetalous

© 2022 - 2024 — McMap. All rights reserved.