How do I remove NaN values from a NumPy array?
[1, 2, NaN, 4, NaN, 8] ⟶ [1, 2, 4, 8]
How do I remove NaN values from a NumPy array?
[1, 2, NaN, 4, NaN, 8] ⟶ [1, 2, 4, 8]
To remove NaN values from a NumPy array x
:
x = x[~numpy.isnan(x)]
The inner function numpy.isnan
returns a boolean/logical array which has the value True
everywhere that x
is not-a-number. Since we want the opposite, we use the logical-not operator ~
to get an array with True
s everywhere that x
is a valid number.
Lastly, we use this logical array to index into the original array x
, in order to retrieve just the non-NaN values.
x = x[~numpy.isnan(x)]
, which is equivalent to mutzmatron's original answer, but shorter. In case you want to keep your infinities around, know that numpy.isfinite(numpy.inf) == False
, of course, but ~numpy.isnan(numpy.inf) == True
. –
Imaginary numpy
. @Imaginary - thanks for pointing out the shorthand for logical_not
, though beware that it is considerably slower - #15998688, #13601488 –
Lanta python -m timeit -s "import numpy; bools = numpy.random.uniform(size=10000) >= 0.5" "numpy.logical_not(bools)"
vs. python -m timeit -s "import numpy; bools = numpy.random.uniform(size=10000) >= 0.5" "~bools"
(numpy.__version__ == '1.8.0'
) –
Imaginary numpy.invert
and numpy.logical_not
and got the same result for both as for ~
, on numpy v1.7.1. Not sure if architecture affects comparative performance - am testing on my chromebook (armv7l). –
Lanta np.where(np.isfinite(x), x, 0)
–
Colleague x
is not a numpy array. If you want to use logical indexing, it must be an array - e.g. x = np.array(x)
–
Lanta .any(axis=1)
. The full code will be x=x[~pd.isnull(x).any(axis=1)]
for Pandas or x=x[~np.isnan(x).any(axis=1)]
for Numpy. Note that these are working on different type of variables. –
Sexpot filter(lambda v: v==v, x)
works both for lists and numpy array since v!=v only for NaN
x
to be specified once as opposed to solutions of the type x[~numpy.isnan(x)]
. This is convenient when x
is defined by a long expression and you don't want to clutter the code by creating a temporary variable to store the result of this long expression. –
Nickles x[~numpy.isnan(x)]
–
Educate [v for v in var if v == v]
–
Athlete TypeError: ufunc 'isnan' not supported for the input types
when the var contains mixtures of nan
and strings, as noted by @AustinRichardson –
Athlete For me the answer by @jmetz didn't work, however using pandas isnull() did.
x = x[~pd.isnull(x)]
x = x[x.notnull()]
–
Stramonium TypeError: ufunc 'isnan' not supported for the input types
. It does not work with strings or object types. This solution did. –
Ladanum NaT
s out of the box –
Amplifier Try this:
import math
print [value for value in x if not math.isnan(value)]
For more, read on List Comprehensions.
print ([value for value in x if not math.isnan(value)])
–
Canale np
package: So returns your list without the nans: [value for value in x if not np.isnan(value)]
–
Viperous @jmetz's answer is probably the one most people need; however it yields a one-dimensional array, e.g. making it unusable to remove entire rows or columns in matrices.
To do so, one should reduce the logical array to one dimension, then index the target array. For instance, the following will remove rows which have at least one NaN value:
x = x[~numpy.isnan(x).any(axis=1)]
See more detail here.
As shown by others
x[~numpy.isnan(x)]
works. But it will throw an error if the numpy dtype is not a native data type, for example if it is object. In that case you can use pandas.
x[~pandas.isna(x)] or x[~pandas.isnull(x)]
If you're using numpy
# first get the indices where the values are finite
ii = np.isfinite(x)
# second get the values
x = x[ii]
The accepted answer changes shape for 2d arrays.
I present a solution here, using the Pandas dropna() functionality.
It works for 1D and 2D arrays. In the 2D case you can choose weather to drop the row or column containing np.nan
.
import pandas as pd
import numpy as np
def dropna(arr, *args, **kwarg):
assert isinstance(arr, np.ndarray)
dropped=pd.DataFrame(arr).dropna(*args, **kwarg).values
if arr.ndim==1:
dropped=dropped.flatten()
return dropped
x = np.array([1400, 1500, 1600, np.nan, np.nan, np.nan ,1700])
y = np.array([[1400, 1500, 1600], [np.nan, 0, np.nan] ,[1700,1800,np.nan]] )
print('='*20+' 1D Case: ' +'='*20+'\nInput:\n',x,sep='')
print('\ndropna:\n',dropna(x),sep='')
print('\n\n'+'='*20+' 2D Case: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna (rows):\n',dropna(y),sep='')
print('\ndropna (columns):\n',dropna(y,axis=1),sep='')
print('\n\n'+'='*20+' x[np.logical_not(np.isnan(x))] for 2D: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna:\n',x[np.logical_not(np.isnan(x))],sep='')
Result:
==================== 1D Case: ====================
Input:
[1400. 1500. 1600. nan nan nan 1700.]
dropna:
[1400. 1500. 1600. 1700.]
==================== 2D Case: ====================
Input:
[[1400. 1500. 1600.]
[ nan 0. nan]
[1700. 1800. nan]]
dropna (rows):
[[1400. 1500. 1600.]]
dropna (columns):
[[1500.]
[ 0.]
[1800.]]
==================== x[np.logical_not(np.isnan(x))] for 2D: ====================
Input:
[[1400. 1500. 1600.]
[ nan 0. nan]
[1700. 1800. nan]]
dropna:
[1400. 1500. 1600. 1700.]
In case it helps, for simple 1d arrays:
x = np.array([np.nan, 1, 2, 3, 4])
x[~np.isnan(x)]
>>> array([1., 2., 3., 4.])
but if you wish to expand to matrices and preserve the shape:
x = np.array([
[np.nan, np.nan],
[np.nan, 0],
[1, 2],
[3, 4]
])
x[~np.isnan(x).any(axis=1)]
>>> array([[1., 2.],
[3., 4.]])
I encountered this issue when dealing with pandas .shift()
functionality, and I wanted to avoid using .apply(..., axis=1)
at all cost due to its inefficiency.
Doing the above :
x = x[~numpy.isnan(x)]
or
x = x[numpy.logical_not(numpy.isnan(x))]
I found that resetting to the same variable (x) did not remove the actual nan values and had to use a different variable. Setting it to a different variable removed the nans. e.g.
y = x[~numpy.isnan(x)]
x
with the new value (i.e. without the NaNs...). Can you provide any more info as to why this could be happening? –
Lanta Simply fill with
x = numpy.array([
[0.99929941, 0.84724713, -0.1500044],
[-0.79709026, numpy.NaN, -0.4406645],
[-0.3599013, -0.63565744, -0.70251352]])
x[numpy.isnan(x)] = .555
print(x)
# [[ 0.99929941 0.84724713 -0.1500044 ]
# [-0.79709026 0.555 -0.4406645 ]
# [-0.3599013 -0.63565744 -0.70251352]]
pandas introduces an option to convert all data types to missing values.
The np.isnan()
function is not compatible with all data types, e.g.
>>> import numpy as np
>>> values = [np.nan, "x", "y"]
>>> np.isnan(values)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
The pd.isna()
and pd.notna()
functions are compatible with many data types and pandas introduces a pd.NA
value:
>>> import numpy as np
>>> import pandas as pd
>>> values = pd.Series([np.nan, "x", "y"])
>>> values
0 NaN
1 x
2 y
dtype: object
>>> values.loc[pd.isna(values)]
0 NaN
dtype: object
>>> values.loc[pd.isna(values)] = pd.NA
>>> values.loc[pd.isna(values)]
0 <NA>
dtype: object
>>> values
0 <NA>
1 x
2 y
dtype: object
#
# using map with lambda, or a list comprehension
#
>>> values = [np.nan, "x", "y"]
>>> list(map(lambda x: pd.NA if pd.isna(x) else x, values))
[<NA>, 'x', 'y']
>>> [pd.NA if pd.isna(x) else x for x in values]
[<NA>, 'x', 'y']
A simplest way is:
numpy.nan_to_num(x)
Documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html
NaN
s with a large number, while the OP asked to entirely remove the elements. –
Jean © 2022 - 2024 — McMap. All rights reserved.
x = x[numpy.isfinite(x)]
– Priesthood