Mean imputation return None and Nan in Dataframe

Asked 17/7, 2023 at 15:30 Answered 18/7, 2023 at 9:27

I want to fill in the missing values of two columns with the mean method. I type of the two columns is float64.

df['col1'].dtypes
dtype('float64')
df['col2'].dtypes
dtype('float64')

I used two methods to fill the columns. 1st I fill the nan values with '0'.

df.replace(np.nan,0, inplace=True )

Then I used fillna.mean() method to fill the columns

 df['col1']=df['col1'].fillna(df['col1'].mean(), inplace=True)

This is return something like that

Col1
Nan
Nan
Nan

I tried second method without first filling the nan values with zero and directly applied mean imputation method which return "None".

I did not understand what was wrong with my implementation. Any help would be appreciated.

Filings answered 17/7, 2023 at 15:30 Comment(1)

Please provide a reproducible snippet of your dataframe. – Merbromin 17/7, 2023 at 16:4

A possible solution (you need to use skipna=True when calculating the mean):

df['col1'].fillna(df['col1'].mean(skipna=True), inplace=True)

Credulity answered 17/7, 2023 at 15:41 Comment(7)

I did that. I tried it without filling the "Nan" value with zero. However, it did not change the output. It still 'None' – Filings 17/7, 2023 at 15:45

It is hard to diagnose the problem without a piece of your data, @Encipher. For instance, the following works fine:

df = pd.DataFrame({     'col1': [1, np.nan, 3],     'col2': ['a', 'b', 'c'],     'col3': [1, 2, 3]})  df['col1'].fillna(df['col1'].mean(skipna=True), inplace=True)

– Credulity 17/7, 2023 at 15:46

Another thing I noticed when I first found out the datatype without replacement the datatype was "dtype('float64')". After imputation and getting None vale the datatype is dtype('O') – Filings 17/7, 2023 at 15:51

I don't know what's the problem. I made a trick. I separated only that column from the dataframe and use mean imputation and that worked well. I don't know what should be its possible explanation. "data = df.col1", data.fillna(data.mean(), inplace = True) – Filings 17/7, 2023 at 15:56

You might also try: df['col1'].replace(np.nan, df['col1'].mean()). – Credulity 17/7, 2023 at 15:59

Thank you. Its work. Can you please explain what's the difference between this command with previous commands? – Filings 17/7, 2023 at 16:45

Without seeing the data, it is hard to find an explanation, @Encipher. – Credulity 17/7, 2023 at 16:57

Quoting the question: "Then I used fillna.mean() method to fill the columns"

df['col1']=df['col1'].fillna(df['col1'].mean(), inplace=True)

Remove the inplace argument since you assign the column. This is the first mistake.

After clearing this typo, this works perfectly despite years past: pandas DataFrame: replace nan values with average of columns

df.fillna(df.mean())

Verified with a fresh random example:

   col1  col2
0   1.0   NaN
1   0.1   1.0
2   NaN   3.2
3   4.0   NaN
4   8.0   0.0

df.fillna(df.mean())
    col1  col2
0  1.000   1.4
1  0.100   1.0
2  3.275   3.2
3  4.000   1.4
4  8.000   0.0

There is no need to replace NaN with 0 in the first place (nor to skipna when calculating the means, as another answer suggested).

Kalasky answered 18/7, 2023 at 9:27 Comment(0)

Recommended topics

Hot tags