How to fill nan values with rolling mean in pandas
Asked Answered
B

3

14

I have a dataframe which contains nan values at few places. I am trying to perform data cleaning in which I fill the nan values with mean of it's previous five instances. To do so, I have come up with the following.

input_data_frame[var_list].fillna(input_data_frame[var_list].rolling(5).mean(), inplace=True)

But, this is not working. It isn't filling the nan values. There is no change in the dataframe's null count before and after the above operation. Assuming I have a dataframe with just integer column, How can I fill NaN values with mean of the previous five instances? Thanks in advance.

Bayly answered 8/3, 2018 at 12:17 Comment(5)
I did put inplace=True in the fillna callBayly
hmm, are you possibly setting values on a copy of the df? - what is var_list? pandas.pydata.org/pandas-docs/stable/…Peddling
var_list is a list that contains column names.Bayly
have you tried running the op without the var_list selector?Peddling
yes, I did try it without var_list.Bayly
H
13

This should work:

input_data_frame[var_list]= input_data_frame[var_list].fillna(pd.rolling_mean(input_data_frame[var_list], 6, min_periods=1))

Note that the window is 6 because it includes the value of NaN itself (which is not counted in the average). Also the other NaN values are not used for the averages, so if less that 5 values are found in the window, the average is calculated on the actual values.

Example:

df = {'a': [1, 1,2,3,4,5, np.nan, 1, 1, 2, 3, 4, 5, np.nan] }
df = pd.DataFrame(data=df)
print df

      a
0   1.0
1   1.0
2   2.0
3   3.0
4   4.0
5   5.0
6   NaN
7   1.0
8   1.0
9   2.0
10  3.0
11  4.0
12  5.0
13  NaN

Output:

      a
0   1.0
1   1.0
2   2.0
3   3.0
4   4.0
5   5.0
6   3.0
7   1.0
8   1.0
9   2.0
10  3.0
11  4.0
12  5.0
13  3.0
Haemostatic answered 8/3, 2018 at 12:32 Comment(4)
Thank you so much. The mistake I made was that I did not put min_periods=1 parameter in the rolling().mean(). Also, thank you for letting me know that the window size should be 6.Bayly
Any idea of how can I achieve this row-wise? I mean replacing null value(say df.iat[i,j]) with average of df.iat[i,j-2], df.iat[i,j-1], df.iat[i,j+1], df.iat[i,j+2]Yam
But what if I want to keep filling up NaN values with the previous rolling mean? This only fills NaN values correctly when there's only one NaN in between data rows.Fidelia
Not totally clear the question, so maybe better if you post a new one But maybe what you need is to create a list of index with nan and using it in a variable for the rolling windows in the functionHaemostatic
M
14

rolling_mean function has been modified in pandas. If you fill the entire dataset, you can use;

filled_dataset = dataset.fillna(dataset.rolling(6,min_periods=1).mean())
Megan answered 18/5, 2020 at 21:57 Comment(1)
But what if I want to keep filling up NaN values with the previous rolling mean? This only fills NaN values correctly when there's only one NaN in between data rows.Fidelia
H
13

This should work:

input_data_frame[var_list]= input_data_frame[var_list].fillna(pd.rolling_mean(input_data_frame[var_list], 6, min_periods=1))

Note that the window is 6 because it includes the value of NaN itself (which is not counted in the average). Also the other NaN values are not used for the averages, so if less that 5 values are found in the window, the average is calculated on the actual values.

Example:

df = {'a': [1, 1,2,3,4,5, np.nan, 1, 1, 2, 3, 4, 5, np.nan] }
df = pd.DataFrame(data=df)
print df

      a
0   1.0
1   1.0
2   2.0
3   3.0
4   4.0
5   5.0
6   NaN
7   1.0
8   1.0
9   2.0
10  3.0
11  4.0
12  5.0
13  NaN

Output:

      a
0   1.0
1   1.0
2   2.0
3   3.0
4   4.0
5   5.0
6   3.0
7   1.0
8   1.0
9   2.0
10  3.0
11  4.0
12  5.0
13  3.0
Haemostatic answered 8/3, 2018 at 12:32 Comment(4)
Thank you so much. The mistake I made was that I did not put min_periods=1 parameter in the rolling().mean(). Also, thank you for letting me know that the window size should be 6.Bayly
Any idea of how can I achieve this row-wise? I mean replacing null value(say df.iat[i,j]) with average of df.iat[i,j-2], df.iat[i,j-1], df.iat[i,j+1], df.iat[i,j+2]Yam
But what if I want to keep filling up NaN values with the previous rolling mean? This only fills NaN values correctly when there's only one NaN in between data rows.Fidelia
Not totally clear the question, so maybe better if you post a new one But maybe what you need is to create a list of index with nan and using it in a variable for the rolling windows in the functionHaemostatic
S
-1

you can simply use interpolate()

df = {'a': [1,5, np.nan, np.nan, np.nan, 2, 5, np.nan] }
df = pd.DataFrame(data=df)
print(df)


df['a'].interpolate()
Serieswound answered 9/2, 2022 at 8:23 Comment(2)
interpolate and rolling average both are techniques to fill nan values. But these two are two different things. rolling average calculates averages/sum of the adjacent missing values. And interpolate fill nans between two values by assuming there is a steady growth or decline in between of these two points.Possing
I wanted to fill nan values with mean of the previous 5 instances. Interpolation is a different thing.Bayly

© 2022 - 2024 — McMap. All rights reserved.