If you have unevenly-spaced intervals, or temporal gaps in your data, and you want to use a rolling window of time frequencies, rather than number of periods, you can easily end up in a situation where x.iloc[-1] - x.iloc[0]
doesn't return the result you expect. Pandas can construct windows with exactly 1 point, so x.iloc[-1] == x.iloc[0]
and the diff is always 0.
Sometimes this is the desired outcome, but other times you might want to use the last-known value from before the start of each window.
A general solution (perhaps not so efficient) is to first artificially construct an evenly-spaced series, interpolate or fill data as needed (e.g. using Series.ffill
), and then use the .rolling()
techniques described in other answers.
# Data with temporal gaps
y = pd.Series(..., index=DatetimeIndex(...))
# Your desired frequency
freq = '1M'
# Construct a new Index with this frequency, using your data ranges
idx_artificial = pd.date_range(y.index.min(), y.index.max(), freq=freq)
# Artificially expand the data to the evenly-spaced index
# New data points will be inserted with null/NaN values
y_artificial = y.reindex(idx_artificial)
# Fill the empty values with last-known value
# This part will vary depending on your needs
y_artificial.ffill(inplace=True)
# Now compute the diffs, using the forward-filled artificially-spaced data
y_diff = y.rolling(freq=freq).apply(lambda x: x.iat[-1] - x.iat[0])
And here are some helper functions to implement the above, for your copy-paste pleasure (warning: lightly-tested code written by a complete stranger, use with caution):
def date_range_from_index(index, freq=None, start=None, end=None, **kwargs):
if start is None:
start = index.min()
if end is None:
end = index.max()
if freq is None:
try:
freq = index.freq
except AttributeError:
freq = None
if freq is None:
raise ValueError('Frequency not provided and input has no set frequency.')
return pd.date_range(start, end, freq=freq, **kwargs)
def fill_dtindex(y, freq=None, start=None, end=None, fill=None):
new_index = date_range_from_index(y.index, freq=freq, start=start, end=end)
y = y.reindex(new_index)
if fill is not None:
if isinstance(fill, str):
y = y.fillna(method=fill)
else:
y = y.fillna(fill)
return y
NaN
. However, I would say it's not efficient sincepd.Series.apply
is not vectorised, but a thinly veiled loop. – Polly