Pandas .rolling specifying time window and win_type

Asked 19/11, 2017 at 13:47 Answered 30/9, 2021 at 8:57

python pandas time-series moving-average

I want to compute a moving average using a time window over an irregular time series using pandas. Ideally, the window should be exponentially weighted using pandas.DataFrame.ewm, but the arguments (e.g. span) do not accept time-based windows. If we try to use pandas.DataFrame.rolling, we realise that we cannot combine time-based windows with win_type.

dft = pd.DataFrame({'B': [0, 1, 2, 3, 4]},
                   index = pd.Index([pd.Timestamp('20130101 09:00:00'),
                                     pd.Timestamp('20130101 09:00:02'),
                                     pd.Timestamp('20130101 09:00:03'),
                                     pd.Timestamp('20130101 09:00:05'),
                                     pd.Timestamp('20130101 09:00:06')],
                                    name='foo'))
dft.rolling('2s', win_types='triang').sum()
>>> ValueError: Invalid window 2s

How to calculate a not equally weighted time-based moving average over an irregular time series?

The expected output for dft.ewm(alpha=0.9, adjust=False).sum() associated with a window of '2s' would be [0*1, 1*1, 2*1+1*0.9, 3*1, 4*1+3*0.9]

Reames answered 19/11, 2017 at 13:47 Comment(5)

How about using dft.resample('1s') before applying rolling()? This way, you can use a rolling function that is based on the size of the window and not time. – Oakland 19/11, 2017 at 17:55

If the time stamp is in milliseconds, this approach would be too computationally and memory expensive – Reames 19/11, 2017 at 18:2

What is your expected output from this data? – Entero 19/11, 2017 at 18:19

I've edited the question with the expected output from an exponentially weighted moving average. – Reames 19/11, 2017 at 19:23

@Reames Did you ever find a solution to this? I'm having the same trouble now. – Marienthal 27/3, 2019 at 2:32

From the documentation it seems that the parameter window must be the number of samples and not a time interval as you would like. Maybe you can try to resample your time-series in order to have a "regular" timeseries. Something like this:

dft = pd.DataFrame({'B': [0, 1, 2, 3, 4]},
                   index = pd.Index([pd.Timestamp('20130101 09:00:00'),
                                     pd.Timestamp('20130101 09:00:02'),
                                     pd.Timestamp('20130101 09:00:03'),
                                     pd.Timestamp('20130101 09:00:05'),
                                     pd.Timestamp('20130101 09:00:06')],
                                    name='foo'))
dft = dft.resample(rule='Xs').mean()
dft.rolling(Y, win_type='triang').sum()

Where X is the resample time-delta and Y is the integer for the window parameter.

Marginal answered 6/7, 2021 at 15:45 Comment(0)

Pandas documentation is misleading. As you found out you can't pass an offset while using win_type. What you can do is pass your own function using .apply as a workaround. E.g., if you want to use triangle windows:

import pandas as pd
from scipy.signal.windows import triang

dft = pd.DataFrame(
    {"B": [0, 1, 2, 3, 4]},
    index=pd.Index(
        [
            pd.Timestamp("20130101 09:00:00"),
            pd.Timestamp("20130101 09:00:02"),
            pd.Timestamp("20130101 09:00:03"),
            pd.Timestamp("20130101 09:00:05"),
            pd.Timestamp("20130101 09:00:06"),
        ],
        name="foo",
    ),
)


def triangle_sum(window):
    weights = triang(len(window))
    return (weights * window).sum()


dft.rolling("2s").apply(triangle_sum, raw=True)

you can define your own weighting scheme and use Numba for performance, if that's a concern.

Theadora answered 18/9, 2021 at 21:32 Comment(0)

Note this is not an exact solution since your output is also irregularly sampled (you seem to desire to retain the time-index of dft).

What I would propose is that you use a package called tsflex. tsflex allows defining a time-based window-stride format for feature-extraction on multivariate & irregularly sampled data:

For your described use-case, this would be

import pandas as pd
from tsflex.features import FeatureCollection, FuncWrapper, FeatureDescriptor

# ---- construct the test data
dft = pd.DataFrame(
    {"B": [0, 1, 2, 3, 4]},
    index=pd.Index(
        [
            pd.Timestamp("20130101 09:00:00"),
            pd.Timestamp("20130101 09:00:02"),
            pd.Timestamp("20130101 09:00:03"),
            pd.Timestamp("20130101 09:00:05"),
            pd.Timestamp("20130101 09:00:06"),
        ],
        name="foo",
    ),
)

# ----- (1) define the feature function and (2) feature-extraction configuration
# (1)
def series_ewm(a: pd.Series, alpha):
    ewm = a.ewm(alpha=alpha, adjust=False)
    print(a.values, ewm.mean().values)
    # IMO, it doesn't make sense to return the sum, as this is sample dependent
    #   -> so I return the last value of the mean-ewm-array
    return ewm.mean()[-1]

# (2)
fc = FeatureCollection(
    FeatureDescriptor(
        function=FuncWrapper(series_ewm, "ewm", input_type=pd.Series, alpha=0.9),
        series_name="B",
        window="2s",
        stride="1s",
    )
)

Finally, run the feature_calculation on dft:

It really is an interesting repo, so you might want to give a look at the examples: https://github.com/predict-idlab/tsflex/tree/main/examples

Apsis answered 30/9, 2021 at 8:57 Comment(0)

-2

This should be working:

dft.rolling(2,freq='s' win_types='triang').sum()

Respirable answered 17/12, 2017 at 19:28 Comment(0)

Recommended topics

Hot tags