Pandas TimeSeries resample produces NaNs
Asked Answered
L

3

13

I am resampling a Pandas TimeSeries. The timeseries consist of binary values (it is a categorical variable) with no missing values, but after resampling NaNs appear. How is this possible?

I can't post any example data here since it is sensitive info, but I create and resample the series as follows:

series = pd.Series(data, ts)
series_rs = series.resample('60T', how='mean')
Linalool answered 27/10, 2015 at 9:50 Comment(1)
If you upsample then the default is to introduce NaN values, besides without representative sample code it's difficult to comment furtherEmpress
V
16

upsampling converts to a regular time interval, so if there are no samples you get NaN.

You can fill missing values backward by fill_method='bfill' or for forward - fill_method='ffill' or fill_method='pad'.

import pandas as pd

ts = pd.date_range('1/1/2015', periods=10, freq='100T')
data = range(10)
series = pd.Series(data, ts)
print series
#2015-01-01 00:00:00    0
#2015-01-01 01:40:00    1
#2015-01-01 03:20:00    2
#2015-01-01 05:00:00    3
#2015-01-01 06:40:00    4
#2015-01-01 08:20:00    5
#2015-01-01 10:00:00    6
#2015-01-01 11:40:00    7
#2015-01-01 13:20:00    8
#2015-01-01 15:00:00    9
#Freq: 100T, dtype: int64
series_rs = series.resample('60T', how='mean')
print series_rs
#2015-01-01 00:00:00     0
#2015-01-01 01:00:00     1
#2015-01-01 02:00:00   NaN
#2015-01-01 03:00:00     2
#2015-01-01 04:00:00   NaN
#2015-01-01 05:00:00     3
#2015-01-01 06:00:00     4
#2015-01-01 07:00:00   NaN
#2015-01-01 08:00:00     5
#2015-01-01 09:00:00   NaN
#2015-01-01 10:00:00     6
#2015-01-01 11:00:00     7
#2015-01-01 12:00:00   NaN
#2015-01-01 13:00:00     8
#2015-01-01 14:00:00   NaN
#2015-01-01 15:00:00     9
#Freq: 60T, dtype: float64
series_rs = series.resample('60T', how='mean', fill_method='bfill')
print series_rs
#2015-01-01 00:00:00    0
#2015-01-01 01:00:00    1
#2015-01-01 02:00:00    2
#2015-01-01 03:00:00    2
#2015-01-01 04:00:00    3
#2015-01-01 05:00:00    3
#2015-01-01 06:00:00    4
#2015-01-01 07:00:00    5
#2015-01-01 08:00:00    5
#2015-01-01 09:00:00    6
#2015-01-01 10:00:00    6
#2015-01-01 11:00:00    7
#2015-01-01 12:00:00    8
#2015-01-01 13:00:00    8
#2015-01-01 14:00:00    9
#2015-01-01 15:00:00    9
#Freq: 60T, dtype: float64
Venter answered 27/10, 2015 at 11:43 Comment(2)
And what do the different fill methods do? The pandas documentation on them is rather limited. ffilll and bfill are self-explanatory but what about pad?Linalool
I think doc explains it better as me. Instead fillna you can use resample.Venter
L
5

Please note that fill_method has now been deprecated. resample() now returns a resampling object on which you can perform operations just like a groupby object.

common downsampling operations:

.mean()
.sum()
.agg()
.apply()

upsampling operations:

.ffill()
.bfill()

See the whats-new message in the documentation https://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0180-breaking-resample

so the example would become

series_rs = series.resample('60T').mean()
Lunneta answered 12/5, 2018 at 10:7 Comment(0)
M
1

When upsampling a time series, after calling .resample() you still need to call .interpolate() on the desired column in order to fill in those NaNs.

df = df.resample('15min').mean()
df['my_column'] = df['my_column'].interpolate()
Mccullers answered 15/4, 2022 at 21:6 Comment(1)
Can you expand on this by editing your answer to include additional details, possibly including a fuller code example?Carnage

© 2022 - 2024 — McMap. All rights reserved.