Python Seasonal decompose Freq paramater determination
Asked Answered
I

3

8

Although the question seems to have been tackled a lot, I cannot figure out why seasonal decompose doesn't work in my case although I am giving as input a dataframe with a Datetime Index. Here is an example of my dataset:

    Customer order actual date  Sales Volumes
0   01/01/1900                           300
1   10/03/2008                          3000
2   15/11/2013                            10
3   23/12/2013                           200
4   04/03/2014                             5
5   17/03/2014                            30
6   22/04/2014                             1
7   26/06/2014                           290
8   30/06/2014                            40

the code snippet is shown below:

from statsmodels.tsa.seasonal import seasonal_decompose
df_agg['Customer order actual date'] = pd.to_datetime(df_agg['Customer order actual date'])
df_agg = df_agg.set_index('Customer order actual date')
df_agg.reset_index().sort_values('Customer order actual date', ascending=True)
decomposition = seasonal_decompose(np.asarray(df_agg['Sales Volumes'] ), model = 'multiplicative')

But I get systematically the following error:

: You must specify a freq or x must be a pandas object with a timeseries index witha freq not set to None

Could you please explain why I should give a freq input although I am using a dataframe with Datetime Index? Does it make sense to give a frequency as an input paramater whereas I am looking for the seasonality as an output of seasonal_decompose?

Izolaiztaccihuatl answered 31/5, 2018 at 5:12 Comment(0)
C
4

The seasonal_decompose function gets the frequency through inferred_freq. Here is the link - https://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DatetimeIndex.html

Inferred_freq on other hand is generated by infer_freq and Infer_freq uses the values of the series and not the index. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.infer_freq.html

This might be a reason why freq needs to be set to a value even with a timeseries index.

And in case you want to know what frequency is in seasonal_decompose() - It is the property of your data. So if you collected your data month by month, then it has monthly frequency.

The method used in seasonal_decompose() to calculate frequency is: _maybe_get_pandas_wrapper_freq().

I did some research on seasonal_decompose() and here are the links which might help you in understanding the function's source code-

source code of seasonal decomposition - https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/seasonal.py

Check out - _maybe_get_pandas_wrapper_freq https://searchcode.com/codesearch/view/86129760/

Hope this helps! Let me know if you find something interesting in addition to it.

Commendam answered 7/6, 2018 at 1:41 Comment(1)
and if you are collecting your data each 10 min over a month so is it going to be how?Plated
K
1

Two points on your code snippet.

  1. On line 4 of your code you are reseting the index, but you are not assigning it to a value, if you want to do it in place, you should add inplace=True
  2. seasonal decompose works on timeseries, so your data needs to have a date time index. (you can do it either while loading the csv, or you can use pd.to_datetime() function.
Kozlowski answered 10/7, 2019 at 15:21 Comment(0)
A
1

First of all, if you hand an np.asarray(...) to seasonal_decompose, it will see only an array, your index is gone. So get rid of the np.asarray.

Secondly, if you look at df_agg['Sales Volumes'].index you will see that freq=None - that's what causes the function to complain. You need an existing frequency like D, M, whatever. You can achieve a frequency by setting it via df_agg.asfreq('D').

Last, but not least: your sample data are not following any frequency - asfreq will fill them up - but you get lots of NaN.

If you want to look up the abbreviations for freqs, they are here.

Adversity answered 29/5, 2020 at 13:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.