Replace NaN in one column with value from corresponding row of second column
Asked Answered
D

7

195

I am working with this Pandas DataFrame in Python.

File    heat    Farheit Temp_Rating
   1    YesQ         75         N/A
   1    NoR         115         N/A
   1    YesA         63         N/A
   1    NoT          83          41
   1    NoY         100          80
   1    YesZ         56          12
   2    YesQ        111         N/A
   2    NoR          60         N/A
   2    YesA         19         N/A
   2    NoT         106          77
   2    NoY          45          21
   2    YesZ         40          54
   3    YesQ         84         N/A
   3    NoR          67         N/A
   3    YesA         94         N/A
   3    NoT          68          39
   3    NoY          63          46
   3    YesZ         34          81

I need to replace all NaNs in the Temp_Rating column with the value from the Farheit column.

This is what I need:

File        heat    Temp_Rating
   1        YesQ             75
   1         NoR            115
   1        YesA             63
   1        YesQ             41
   1         NoR             80
   1        YesA             12
   2        YesQ            111
   2         NoR             60
   2        YesA             19
   2         NoT             77
   2         NoY             21
   2        YesZ             54
   3        YesQ             84
   3         NoR             67
   3        YesA             94
   3         NoT             39
   3         NoY             46
   3        YesZ             81

If I do a Boolean selection, I can pick out only one of these columns at a time. The problem is if I then try to join them, I am not able to do this while preserving the correct order.

How can I only find Temp_Rating rows with the NaNs and replace them with the value in the same row of the Farheit column?

Dairy answered 20/3, 2015 at 23:43 Comment(0)
S
274

Assuming your DataFrame is in df:

df.Temp_Rating.fillna(df.Farheit, inplace=True)
del df['Farheit']
df.columns = 'File heat Observations'.split()

First replace any NaN values with the corresponding value of df.Farheit. Delete the 'Farheit' column. Then rename the columns. Here's the resulting DataFrame:

    File  heat  Observations
0      1  YesQ            75
1      1   NoR           115
2      1  YesA            63
3      1   NoT            41
4      1   NoY            80
5      1  YesZ            12
6      2  YesQ           111
7      2   NoR            60
8      2  YesA            19
9      2   NoT            77
10     2   NoY            21
11     2  YesZ            54
12     3  YesQ            84
13     3   NoR            67
14     3  YesA            94
15     3   NoT            39
16     3   NoY            46
17     3  YesZ            81
Shermanshermie answered 21/3, 2015 at 0:3 Comment(6)
how to work with this if both columns datatype are object and instead of N/A, it is empty cell in that row?Signalize
One possible approach to consider: You could first replace the empty string by NaN (see here) and then use this approach.Dairy
The answer is perfect. Just if you like to stay more in pandas syntax I'd suggest to delete columns by df.drop("Farheit", axis=1) , but thats probably personal preferenceTransfigure
@Transfigure Agree drop now preferred to del in Pandas-land. If using a recent Pandas, would recommend df = df.drop(columns='Farheit') over numerical axis numbering.Shermanshermie
How can we use this method if the column names are string labels with spaces? In that case is it better to use the pop method in the answer below? The pop method worked for me but just wondering about other syntax. Thanks.Toughminded
No, this still works. Any column can be addressed as df['my column with spaces'] and the setting of all column names can be done with a list, e.g.: df.columns = ['my file name', 'heat value', 'the temperature in degrees F']. I prefer spaceless column names in order to use the terse df.column_name syntax, but that's a preference not a strict requirement. Spaces can work.Shermanshermie
A
64

The above mentioned solutions did not work for me. The method I used was:

df.loc[df['foo'].isnull(),'foo'] = df['bar']
Analiese answered 7/7, 2017 at 14:40 Comment(2)
Did it raise an exception or simply not work? Try isna() instead of isnull().Olympias
It should probably be like that: df.loc[df['foo'].isna(),'foo'] = df.loc[df['foo'].isna(),'bar']Astronomical
C
19

@Jonathan's answer is good, but an overkill, just use pop:

df['Temp_Rating'] = df['Temp_Rating'].fillna(df.pop('Farheit'))
Conspiracy answered 3/10, 2021 at 4:9 Comment(2)
I tried all the answers on this page and this one is the best for what I need. ThanxTrey
I <3 this solution!Entrust
D
8

An other way to solve this problem,

import pandas as pd
import numpy as np

ts_df = pd.DataFrame([[1,"YesQ",75,],[1,"NoR",115,],[1,"NoT",63,13],[2,"YesT",43,71]],columns=['File','heat','Farheit','Temp'])


def fx(x):
    if np.isnan(x['Temp']):
        return x['Farheit']
    else:
        return x['Temp']
print(1,ts_df)
ts_df['Temp']=ts_df.apply(lambda x : fx(x),axis=1)

print(2,ts_df)

returns:

(1,    File  heat  Farheit  Temp                                                                                    
0     1  YesQ       75   NaN                                                                                        
1     1   NoR      115   NaN                                                                                        
2     1   NoT       63  13.0                                                                                        
3     2  YesT       43  71.0)                                                                                       
(2,    File  heat  Farheit   Temp                                                                                   
0     1  YesQ       75   75.0                                                                                       
1     1   NoR      115  115.0
2     1   NoT       63   13.0
3     2  YesT       43   71.0)
Destruct answered 15/12, 2017 at 13:16 Comment(0)
H
6

You can also use mask which replaces the values where Temp_Rating is NaN by the column Farheit:

df['Temp_Rating'] = df['Temp_Rating'].mask(df['Temp_Rating'].isna(), df['Farheit'])
Hollerman answered 24/4, 2022 at 20:8 Comment(0)
B
5

The accepted answer uses fillna() which will fill in missing values where the two dataframes share indices. As explained nicely here, you can use combine_first to fill in missing values, rows and index values for situations where the indices of the two dataframes don't match.

df.Col1 = df.Col1.fillna(df.Col2) #fill in missing values if indices match

#or 
df.Col1 = df.Col1.combine_first(df.Col2) #fill in values, rows, and indices
Branen answered 5/2, 2021 at 1:40 Comment(1)
Nice answer. In the question here, I didn't expect to have non-overlapping indices in the data so .filna() was sufficient. Actually, the focus here is on a single column (Temp_Rating), where the NaNs occur in the data, so the two approaches - .fillna() and combine_first() - end up producing the equivalent output. Nonetheless, this is a really useful approach.Dairy
E
0

Coming very late, but I came across a similar problem and this is how I solved it, seemed a little bit more concise for me... hope it works for everyone in a similar situation

def function_a (row):
if row['Temp_Rating'] is None : 
    val = print(row['Farheit'])
    return val
 df['Temp_Rating'] = df.apply(function_a, axis=1)
 df1= df.drop([Farheit], axis=1)
Endothermic answered 12/10, 2022 at 9:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.