Pandas .rolling specifying time window and win_type
Asked Answered
R

4

10

I want to compute a moving average using a time window over an irregular time series using pandas. Ideally, the window should be exponentially weighted using pandas.DataFrame.ewm, but the arguments (e.g. span) do not accept time-based windows. If we try to use pandas.DataFrame.rolling, we realise that we cannot combine time-based windows with win_type.

dft = pd.DataFrame({'B': [0, 1, 2, 3, 4]},
                   index = pd.Index([pd.Timestamp('20130101 09:00:00'),
                                     pd.Timestamp('20130101 09:00:02'),
                                     pd.Timestamp('20130101 09:00:03'),
                                     pd.Timestamp('20130101 09:00:05'),
                                     pd.Timestamp('20130101 09:00:06')],
                                    name='foo'))
dft.rolling('2s', win_types='triang').sum()
>>> ValueError: Invalid window 2s

How to calculate a not equally weighted time-based moving average over an irregular time series?

The expected output for dft.ewm(alpha=0.9, adjust=False).sum() associated with a window of '2s' would be [0*1, 1*1, 2*1+1*0.9, 3*1, 4*1+3*0.9]

Reames answered 19/11, 2017 at 13:47 Comment(5)
How about using dft.resample('1s') before applying rolling()? This way, you can use a rolling function that is based on the size of the window and not time.Oakland
If the time stamp is in milliseconds, this approach would be too computationally and memory expensiveReames
What is your expected output from this data?Entero
I've edited the question with the expected output from an exponentially weighted moving average.Reames
@Reames Did you ever find a solution to this? I'm having the same trouble now.Marienthal
M
1

From the documentation it seems that the parameter window must be the number of samples and not a time interval as you would like. Maybe you can try to resample your time-series in order to have a "regular" timeseries. Something like this:

dft = pd.DataFrame({'B': [0, 1, 2, 3, 4]},
                   index = pd.Index([pd.Timestamp('20130101 09:00:00'),
                                     pd.Timestamp('20130101 09:00:02'),
                                     pd.Timestamp('20130101 09:00:03'),
                                     pd.Timestamp('20130101 09:00:05'),
                                     pd.Timestamp('20130101 09:00:06')],
                                    name='foo'))
dft = dft.resample(rule='Xs').mean()
dft.rolling(Y, win_type='triang').sum()

Where X is the resample time-delta and Y is the integer for the window parameter.

Marginal answered 6/7, 2021 at 15:45 Comment(0)
T
1

Pandas documentation is misleading. As you found out you can't pass an offset while using win_type. What you can do is pass your own function using .apply as a workaround. E.g., if you want to use triangle windows:

import pandas as pd
from scipy.signal.windows import triang

dft = pd.DataFrame(
    {"B": [0, 1, 2, 3, 4]},
    index=pd.Index(
        [
            pd.Timestamp("20130101 09:00:00"),
            pd.Timestamp("20130101 09:00:02"),
            pd.Timestamp("20130101 09:00:03"),
            pd.Timestamp("20130101 09:00:05"),
            pd.Timestamp("20130101 09:00:06"),
        ],
        name="foo",
    ),
)


def triangle_sum(window):
    weights = triang(len(window))
    return (weights * window).sum()


dft.rolling("2s").apply(triangle_sum, raw=True)

you can define your own weighting scheme and use Numba for performance, if that's a concern.

Theadora answered 18/9, 2021 at 21:32 Comment(0)
A
0

Note this is not an exact solution since your output is also irregularly sampled (you seem to desire to retain the time-index of dft).

What I would propose is that you use a package called tsflex. tsflex allows defining a time-based window-stride format for feature-extraction on multivariate & irregularly sampled data:

For your described use-case, this would be

import pandas as pd
from tsflex.features import FeatureCollection, FuncWrapper, FeatureDescriptor

# ---- construct the test data
dft = pd.DataFrame(
    {"B": [0, 1, 2, 3, 4]},
    index=pd.Index(
        [
            pd.Timestamp("20130101 09:00:00"),
            pd.Timestamp("20130101 09:00:02"),
            pd.Timestamp("20130101 09:00:03"),
            pd.Timestamp("20130101 09:00:05"),
            pd.Timestamp("20130101 09:00:06"),
        ],
        name="foo",
    ),
)

# ----- (1) define the feature function and (2) feature-extraction configuration
# (1)
def series_ewm(a: pd.Series, alpha):
    ewm = a.ewm(alpha=alpha, adjust=False)
    print(a.values, ewm.mean().values)
    # IMO, it doesn't make sense to return the sum, as this is sample dependent
    #   -> so I return the last value of the mean-ewm-array
    return ewm.mean()[-1]

# (2)
fc = FeatureCollection(
    FeatureDescriptor(
        function=FuncWrapper(series_ewm, "ewm", input_type=pd.Series, alpha=0.9),
        series_name="B",
        window="2s",
        stride="1s",
    )
)

Finally, run the feature_calculation on dft:

enter image description here

It really is an interesting repo, so you might want to give a look at the examples: https://github.com/predict-idlab/tsflex/tree/main/examples

Apsis answered 30/9, 2021 at 8:57 Comment(0)
R
-2

This should be working:

dft.rolling(2,freq='s' win_types='triang').sum()
Respirable answered 17/12, 2017 at 19:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.