How to filter in NaN (pandas)?
Asked Answered
A

5

140

I have a pandas dataframe (df), and I want to do something like:

newdf = df[(df.var1 == 'a') & (df.var2 == NaN)]

I've tried replacing NaN with np.NaN, or 'NaN' or 'nan' etc, but nothing evaluates to True. There's no pd.NaN.

I can use df.fillna(np.nan) before evaluating the above expression but that feels hackish and I wonder if it will interfere with other pandas operations that rely on being able to identify pandas-format NaN's later.

I get the feeling there should be an easy answer to this question, but somehow it has eluded me.

Ankylosis answered 31/7, 2014 at 2:57 Comment(2)
NaN when compared to itself returns false. Have you tried df.var2 != df.var2?Swick
There is actually a pd.NA as of Pandas 1.0, not that that would be useful here.Alderney
S
126

This doesn't work because NaN isn't equal to anything, including NaN. Use pd.isnull(df.var2) instead.

Satinet answered 31/7, 2014 at 3:2 Comment(4)
Thanks for the suggestion and the nice explanation. I see df.var2.isnull() is another variation on this answer.Ankylosis
Though that doesn't cover the case when you don't want to filter out NaN values. Sort of equilvalent to df.var2 != NaNBishopric
For others like me having @multigoodverse's observation, I found out there's also pd.notnull(). So you can keep NaN vals with df.loc[pd.isnull(df.var)] or filter them out with df.loc[pd.notnull(df.var)].Metastasis
You can also filter for nan with the unary operator (~). something like df.loc[~pd.isnull(df.var)]Lydgate
B
164
filtered_df = df[df['var2'].isna()]

This filters and gives you rows which has only NaN values in 'var2' column.

Note: "Series.isnull is an alias for Series.isna."

Bertabertasi answered 4/12, 2017 at 9:18 Comment(0)
S
126

This doesn't work because NaN isn't equal to anything, including NaN. Use pd.isnull(df.var2) instead.

Satinet answered 31/7, 2014 at 3:2 Comment(4)
Thanks for the suggestion and the nice explanation. I see df.var2.isnull() is another variation on this answer.Ankylosis
Though that doesn't cover the case when you don't want to filter out NaN values. Sort of equilvalent to df.var2 != NaNBishopric
For others like me having @multigoodverse's observation, I found out there's also pd.notnull(). So you can keep NaN vals with df.loc[pd.isnull(df.var)] or filter them out with df.loc[pd.notnull(df.var)].Metastasis
You can also filter for nan with the unary operator (~). something like df.loc[~pd.isnull(df.var)]Lydgate
H
29
df[df['var'].isna()]

where "var" is the column name

Heeler answered 26/5, 2021 at 15:57 Comment(1)
This is the same as Gil's answer since "Series.isnull is an alias for Series.isna." Please don't post duplicate answers.Alderney
I
10

Pandas uses numpy's NaN value. Use numpy.isnan to obtain a Boolean vector from a pandas series.

Isaac answered 31/7, 2014 at 3:3 Comment(2)
You can't use numpy.isnan as an inputVino
np.isnan doesn't work on "object" dtype. Plus, as of Pandas 1.0 (2020), Pandas has pd.NA, which NumPy can't handle, so it filters through to the result.Alderney
H
2

You can also use query here:

df.query('var2 != var2')

This works since np.nan != np.nan.

Henrion answered 1/5, 2022 at 20:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.