Why does df.where() not replace all null values?
Asked Answered
O

1

2

I have a dataframe with very mixed columns. I am trying to set all occurrences of None or NaN to None.

I am trying the answer to this question: Use None instead of np.nan for null values in pandas DataFrame But the accepted answer does not catch all null instances. Example:

my_array = ['1','2',None,4]
df = pd.DataFrame([my_array], columns=['Morning', 'Midday', 'Evening', 'Night'])
df = df.append({'Midday':'10'}, ignore_index=True)

which results in

  Morning Midday Evening  Night
0       1      2    None    4.0
1     NaN     10     NaN    NaN

Applying df.where() to find and replace all null vaules results in:

df.where(df.notnull(), None)

  Morning Midday Evening  Night
0       1      2    None    4.0
1    None     10    None    NaN

But I want output

  Morning Midday Evening  Night
0       1      2    None    4.0
1    None     10    None    None

What am I getting wrong, or is df.where() getting wrong?

Ommatidium answered 21/1, 2022 at 15:38 Comment(2)
The null values get treated differently because the dtype of df['Night'] column is float, which can natively handle NaN values, whereas the integer columns can't natively handle NaN, so they get coerced to 'object'.Decomposer
Retitled, because this question is less general than the title claimed to be. (Also, df.append got deprecated in pandas 1.4)Decomposer
C
2

You need to change the datatype to object

out = df.astype(object).where(df.notna(), None)
Out[392]: 
  Morning Midday Evening Night
0       1      2    None   4.0
1    None     10    None  None
Chenay answered 21/1, 2022 at 15:43 Comment(3)
Great answer, works for me. But please help me understand: Why is this? Both df.where() as well as df.notna() can be applied to numeric dtypes. Why the cast to object?Ommatidium
@Ommatidium correct , the column type will impact the NaN and None changeChenay
Useful to explain why: the null values get treated differently because the dtype of df['Night'] column is float, which can natively handle NaN values, whereas the integer columns can't natively handle NaN, so they get coerced to 'object'.Decomposer

© 2022 - 2025 — McMap. All rights reserved.