Why does df.where() not replace all null values?

About

Asked 21/1, 2022 at 15:38 Answered 21/1, 2022 at 15:43

I have a dataframe with very mixed columns. I am trying to set all occurrences of None or NaN to None.

I am trying the answer to this question: Use None instead of np.nan for null values in pandas DataFrame But the accepted answer does not catch all null instances. Example:

my_array = ['1','2',None,4]
df = pd.DataFrame([my_array], columns=['Morning', 'Midday', 'Evening', 'Night'])
df = df.append({'Midday':'10'}, ignore_index=True)

which results in

  Morning Midday Evening  Night
0       1      2    None    4.0
1     NaN     10     NaN    NaN

Applying df.where() to find and replace all null vaules results in:

df.where(df.notnull(), None)

  Morning Midday Evening  Night
0       1      2    None    4.0
1    None     10    None    NaN

But I want output

  Morning Midday Evening  Night
0       1      2    None    4.0
1    None     10    None    None

What am I getting wrong, or is df.where() getting wrong?

Ommatidium answered 21/1, 2022 at 15:38 Comment(2)

The null values get treated differently because the dtype of df['Night'] column is float, which can natively handle NaN values, whereas the integer columns can't natively handle NaN, so they get coerced to 'object'. – Decomposer 9/9, 2023 at 1:30

Retitled, because this question is less general than the title claimed to be. (Also, df.append got deprecated in pandas 1.4) – Decomposer 9/9, 2023 at 1:33

You need to change the datatype to object

out = df.astype(object).where(df.notna(), None)
Out[392]: 
  Morning Midday Evening Night
0       1      2    None   4.0
1    None     10    None  None

Chenay answered 21/1, 2022 at 15:43 Comment(3)

Great answer, works for me. But please help me understand: Why is this? Both df.where() as well as df.notna() can be applied to numeric dtypes. Why the cast to object? – Ommatidium 21/1, 2022 at 15:45

@Ommatidium correct , the column type will impact the NaN and None change – Chenay 21/1, 2022 at 15:48

Useful to explain why: the null values get treated differently because the dtype of df['Night'] column is float, which can natively handle NaN values, whereas the integer columns can't natively handle NaN, so they get coerced to 'object'. – Decomposer 9/9, 2023 at 1:30

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags