Pandas, numpy.where(), and numpy.nan
Asked Answered
J

2

8

I want to use numpy.where() to add a column to a pandas.DataFrame. I'd like to use NaN values for the rows where the condition is false (to indicate that these values are "missing").

Consider:

>>> import numpy; import pandas
>>> df = pandas.DataFrame({'A':[1,2,3,4]}); print(df)
   A
0  1
1  2
2  3
3  4
>>> df['B'] = numpy.nan
>>> df['C'] = numpy.where(df['A'] < 3, 'yes', numpy.nan)
>>> print(df)
   A   B    C
0  1 NaN  yes
1  2 NaN  yes
2  3 NaN  nan
3  4 NaN  nan
>>> df.isna()
       A     B      C
0  False  True  False
1  False  True  False
2  False  True  False
3  False  True  False

Why does B show "NaN" but C shows "nan"? And why does DataFrame.isna() fail to detect the NaN values in C?

Should I use something other than numpy.nan inside where? None and pandas.NA both seem to work and can be detected by DataFrame.isna(), but I'm not sure these are the best choice.

Thank you!

Edit: As per @Tim Roberts and @DYZ, numpy.where returns an array of type string, so the str constructor is called on numpy.NaN. The values in column C are actually strings "nan". The question remains, however: what is the most elegant thing to do here? Should I use None? Or something else?

Jambalaya answered 10/5, 2021 at 21:12 Comment(0)
H
5

np.where coerces the second and the third parameter to the same datatype. Since the second parameter is a string, the third one is converted to a string, too, by calling function str():

str(numpy.nan)
# 'nan'

As the result, the values in column C are all strings.

You can first fill the NaN rows with None and then convert them to np.nan with fillna():

df['C'] = numpy.where(df['A'] < 3, 'yes', None)
df['C'].fillna(np.nan, inplace=True)
Hylan answered 10/5, 2021 at 21:48 Comment(0)
B
0

B is a pure numeric column. C has a mixture of strings and numerics, so the column has type "object", and it prints differently.

Bearer answered 10/5, 2021 at 21:24 Comment(1)
It does not print differently. It is different.Hylan

© 2022 - 2024 — McMap. All rights reserved.