Pandas dataframe column subtraction, handling NaN
Asked Answered
S

2

5

I have a data frame for example

df = pd.DataFrame([(np.nan, .32), (.01, np.nan), (np.nan, np.nan), (.21, .18)],
                  columns=['A', 'B'])
        A   B
0   NaN     0.32
1   0.01    NaN
2   NaN     NaN
3   0.21    0.18

And I want to subtract column B from A

df['diff'] = df['A'] - df['B']

    A     B      diff
0   NaN   0.32   NaN
1   0.01  NaN    NaN
2   NaN   NaN    NaN
3   0.21  0.18   0.03

Difference returns NaN if one of the columns is NaN. To overcome this I use fillna

df['diff'] = df['A'].fillna(0) - df['B'].fillna(0)

    A     B      diff
0   NaN   0.32   -0.32
1   0.01  NaN    0.01
2   NaN   NaN    0.00
3   0.21  0.18   0.03

This solves NaN coming in the diff column, but for index 2 the result is coming to 0, while I want the difference as NaN since columns A and B are NaN.

Is there a way to explicitly tell pandas to output NaN if both columns are NaN?

Studied answered 15/1, 2021 at 5:38 Comment(0)
S
7

Use Series.sub with fill_value=0 parameter:

df['diff'] = df['A'].sub(df['B'], fill_value=0)
print (df)
      A     B  diff
0   NaN  0.32 -0.32
1  0.01   NaN  0.01
2   NaN   NaN   NaN
3  0.21  0.18  0.03

If need replace NaNs to 0 add Series.fillna:

df['diff'] = df['A'].sub(df['B'], fill_value=0).fillna(0)
print (df)
      A     B  diff
0   NaN  0.32 -0.32
1  0.01   NaN  0.01
2   NaN   NaN  0.00
3  0.21  0.18  0.03
Selfregard answered 15/1, 2021 at 5:39 Comment(3)
Nice one, although there's a NaN at 2nd row. And I realized that it's what OP wants.Chantress
The original solution you wrote works df['diff'] = df['A'].sub(df['B'], fill_value=0) . If I chain .fillna(0) to it I end up with bad output I am getting currently. If you could remove fill_value=0 from answerI can mark this as solution.Studied
can you please update the diff column of row 2 to 0 - which is the right output for the operation performed.Studied
B
0
def subtract_float_columns_simple(df, A, B):
    # Convert columns to float, set invalid parsing to NaN
    df[A] = pd.to_numeric(df[A], errors='coerce')
    df[B] = pd.to_numeric(df[B], errors='coerce')

    # Subtract the columns, NaN will be propagated automatically
    return df[B] - df[A]
Bowdlerize answered 17/10 at 19:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.