Create new column from multiple columns where value that is not NaN

Asked 22/9, 2020 at 10:40 Answered 22/9, 2020 at 10:47

I'm trying to create a new column from 4 multiple columns that start with the same name. Either 3 or 4 out of these columns are NaN. I'd like to have the new column to have this non-NaN value, if it exists in the particular row. Otherwise the new column should be NaN.

Start with:

    NaN = np.NaN
    pd.DataFrame( {'process_time_1': [5, NaN, NaN, NaN], 'process_time_2': [NaN, NaN, NaN, NaN], 
    'process_time_3': [NaN, NaN, 3, 4], 'process_time_4': [NaN, NaN, NaN, NaN]} )

if would like the new column to look like:

    NaN = np.NaN
    pd.DataFrame( {'process_time': [5, NaN, 3, 4]} )

The code I currently have:

    cols =  [df1.columns.str.startswith('process')]
    df1[cols][df1.notna()]

I'm breaking my head of this, would be great if somebody could help into the right direction:) Thanks in advance !

Orthoptic answered 22/9, 2020 at 10:40 Comment(3)

There is always only one non NaN values per rows? – Masorete 22/9, 2020 at 11:7

yes this is correct. All columns are imported and there are 4 silo's. – Orthoptic 22/9, 2020 at 11:12

Raw data is stored like this – Orthoptic 22/9, 2020 at 11:12

You can use ffill here to forward fill with the valid values, and just keep the last column:

df['process_time'] = df.filter(like='process').ffill(axis=1).iloc[:,-1]

print(df)
   process_time_1  process_time_2  process_time_3  process_time_4  \
0             5.0             NaN             NaN             NaN   
1             NaN             NaN             NaN             NaN   
2             NaN             NaN             3.0             NaN   
3             NaN             NaN             4.0             NaN   

   process_time  
0           5.0  
1           NaN  
2           3.0  
3           4.0

Twelvemonth answered 22/9, 2020 at 10:44 Comment(0)

You can use max along axis=1:

df['process_time'] = df.filter(like='process').max(1)

   process_time_1  process_time_2  process_time_3  process_time_4  process_time
0             5.0             NaN             NaN             NaN           5.0
1             NaN             NaN             NaN             NaN           NaN
2             NaN             NaN             3.0             NaN           3.0
3             NaN             NaN             4.0             NaN           4.0

Celebration answered 22/9, 2020 at 10:47 Comment(4)

you are a minute faster than me :) +1 – Crowned 22/9, 2020 at 10:53

Thanks @AndyL. and sorry for that :-) – Celebration 22/9, 2020 at 11:0

@Masorete dude check the question its mentioned there Either 3 or 4 out of these columns are NaN – Celebration 22/9, 2020 at 11:10

First i deep thinking about it, ya, it is same like only one maximal non NaN value. – Masorete 22/9, 2020 at 11:12

Recommended topics

Hot tags