I have two datasets like this
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'id': [1, 2,3,4,5], 'first': [np.nan,np.nan,1,0,np.nan], 'second': [1,np.nan,np.nan,np.nan,0]})
df2 = pd.DataFrame({'id': [1, 2,3,4,5, 6], 'first': [np.nan,1,np.nan,np.nan,0, 1], 'third': [1,0,np.nan,1,1, 0]})
And I want to get
result = pd.merge(df1, df2, left_index=True, right_index=True,on='id', how= 'outer')
result['first']= result[["first_x", "first_y"]].sum(axis=1)
result.loc[(result['first_x'].isnull()) & (result['first_y'].isnull()), 'first'] = np.nan
result.drop(['first_x','first_y'] , 1)
id second third first
0 1 1.0 1.0 NaN
1 2 NaN 0.0 1.0
2 3 NaN NaN 1.0
3 4 NaN 1.0 0.0
4 5 0.0 1.0 0.0
5 6 NaN 0.0 1.0
The problem is that the real dataset includes about 200 variables and my way is very long. How to make it easier? Thanks