Mean imputation return None and Nan in Dataframe
Asked Answered
F

2

0

I want to fill in the missing values of two columns with the mean method. I type of the two columns is float64.

df['col1'].dtypes
dtype('float64')
df['col2'].dtypes
dtype('float64')

I used two methods to fill the columns. 1st I fill the nan values with '0'.

df.replace(np.nan,0, inplace=True )

Then I used fillna.mean() method to fill the columns

 df['col1']=df['col1'].fillna(df['col1'].mean(), inplace=True)

This is return something like that

Col1
Nan
Nan
Nan

I tried second method without first filling the nan values with zero and directly applied mean imputation method which return "None".

I did not understand what was wrong with my implementation. Any help would be appreciated.

Filings answered 17/7, 2023 at 15:30 Comment(1)
Please provide a reproducible snippet of your dataframe.Merbromin
C
0

A possible solution (you need to use skipna=True when calculating the mean):

df['col1'].fillna(df['col1'].mean(skipna=True), inplace=True)
Credulity answered 17/7, 2023 at 15:41 Comment(7)
I did that. I tried it without filling the "Nan" value with zero. However, it did not change the output. It still 'None'Filings
It is hard to diagnose the problem without a piece of your data, @Encipher. For instance, the following works fine: df = pd.DataFrame({ 'col1': [1, np.nan, 3], 'col2': ['a', 'b', 'c'], 'col3': [1, 2, 3]}) df['col1'].fillna(df['col1'].mean(skipna=True), inplace=True)Credulity
Another thing I noticed when I first found out the datatype without replacement the datatype was "dtype('float64')". After imputation and getting None vale the datatype is dtype('O')Filings
I don't know what's the problem. I made a trick. I separated only that column from the dataframe and use mean imputation and that worked well. I don't know what should be its possible explanation. "data = df.col1", data.fillna(data.mean(), inplace = True)Filings
You might also try: df['col1'].replace(np.nan, df['col1'].mean()).Credulity
Thank you. Its work. Can you please explain what's the difference between this command with previous commands?Filings
Without seeing the data, it is hard to find an explanation, @Encipher.Credulity
K
0

Quoting the question: "Then I used fillna.mean() method to fill the columns"

df['col1']=df['col1'].fillna(df['col1'].mean(), inplace=True)

Remove the inplace argument since you assign the column. This is the first mistake.

After clearing this typo, this works perfectly despite years past: pandas DataFrame: replace nan values with average of columns

df.fillna(df.mean())

Verified with a fresh random example:

   col1  col2
0   1.0   NaN
1   0.1   1.0
2   NaN   3.2
3   4.0   NaN
4   8.0   0.0
df.fillna(df.mean())
    col1  col2
0  1.000   1.4
1  0.100   1.0
2  3.275   3.2
3  4.000   1.4
4  8.000   0.0

There is no need to replace NaN with 0 in the first place (nor to skipna when calculating the means, as another answer suggested).

Kalasky answered 18/7, 2023 at 9:27 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.