pandas.DatetimeIndex frequency is None and can't be set
Asked Answered
S

6

34

I created a DatetimeIndex from a "date" column:

sales.index = pd.DatetimeIndex(sales["date"])

Now the index looks as follows:

DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-06',
                   '2003-01-07', '2003-01-08', '2003-01-09', '2003-01-10',
                   '2003-01-11', '2003-01-13',
                   ...
                   '2016-07-22', '2016-07-23', '2016-07-24', '2016-07-25',
                   '2016-07-26', '2016-07-27', '2016-07-28', '2016-07-29',
                   '2016-07-30', '2016-07-31'],
                  dtype='datetime64[ns]', name='date', length=4393, freq=None)

As you see, the freq attribute is None. I suspect that errors down the road are caused by the missing freq. However, if I try to set the frequency explicitly:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-148-30857144de81> in <module>()
      1 #### DEBUG
----> 2 sales_train = disentangle(df_train)
      3 sales_holdout = disentangle(df_holdout)
      4 result = sarima_fit_predict(sales_train.loc[5002, 9990]["amount_sold"], sales_holdout.loc[5002, 9990]["amount_sold"])

<ipython-input-147-08b4c4ecdea3> in disentangle(df_train)
      2     # transform sales table to disentangle sales time series
      3     sales = df_train[["date", "store_id", "article_id", "amount_sold"]]
----> 4     sales.index = pd.DatetimeIndex(sales["date"], freq="d")
      5     sales = sales.pivot_table(index=["store_id", "article_id", "date"])
      6     return sales

/usr/local/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
     89                 else:
     90                     kwargs[new_arg_name] = new_arg_value
---> 91             return func(*args, **kwargs)
     92         return wrapper
     93     return _deprecate_kwarg

/usr/local/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
    399                                          'dates does not conform to passed '
    400                                          'frequency {1}'
--> 401                                          .format(inferred, freq.freqstr))
    402 
    403         if freq_infer:

ValueError: Inferred frequency None from passed dates does not conform to passed frequency D

So apparently a frequency has been inferred, but is stored neither in the freq nor inferred_freq attribute of the DatetimeIndex - both are None. Can someone clear up the confusion?

Salpingotomy answered 14/9, 2017 at 11:10 Comment(4)
does sales.index = pd.DatetimeIndex(sales["date"].asfreq(freq='D')) work?Champ
No. "ValueError: Length mismatch: Expected axis has 218153 elements, new values have 1 elements"Salpingotomy
Your data sample does not have a frequency per-se. Judging the information you provide, 2003-01-05 and 2003-01-12 are missing. Moreover, 2003-01-05 + 4393 days makes 2015-01-12, not 2016-07-31.Minorca
I'm not sure why @EdChum's answer wouldn't work. Maybe syntax issue? See my anwer where I applied asfreq to the whole dataframe rather than just the index. If that's not the issue it may be hard to say unless you can post a smaller sample dataframe that exhibits the same issue.Tetrafluoroethylene
B
23

You have a couple options here:

  • pd.infer_freq
  • pd.tseries.frequencies.to_offset

I suspect that errors down the road are caused by the missing freq.

You are absolutely right. Here's what I use often:

def add_freq(idx, freq=None):
    """Add a frequency attribute to idx, through inference or directly.

    Returns a copy.  If `freq` is None, it is inferred.
    """

    idx = idx.copy()
    if freq is None:
        if idx.freq is None:
            freq = pd.infer_freq(idx)
        else:
            return idx
    idx.freq = pd.tseries.frequencies.to_offset(freq)
    if idx.freq is None:
        raise AttributeError('no discernible frequency found to `idx`.  Specify'
                             ' a frequency string with `freq`.')
    return idx

An example:

idx=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06'])  # freq=None

print(add_freq(idx))  # inferred
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='B')

print(add_freq(idx, freq='D'))  # explicit
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='D')

Using asfreq will actually reindex (fill) missing dates, so be careful of that if that's not what you're looking for.

The primary function for changing frequencies is the asfreq function. For a DatetimeIndex, this is basically just a thin, but convenient wrapper around reindex which generates a date_range and calls reindex.

Blades answered 14/9, 2017 at 12:51 Comment(2)
In Python 3.7.10 this code produces an error. Specifically, the line print(add_freq(idx, freq='D')) produces ValueError: Inferred frequency B from passed values does not conform to passed frequency DKnick
I can confirm about this error.Scofield
T
13

It seems to relate to missing dates as 3kt notes. You might be able to "fix" with asfreq('D') as EdChum suggests but that gives you a continuous index with missing data values. It works fine for some some sample data I made up:

df=pd.DataFrame({ 'x':[1,2,4] }, 
   index=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06']) )

df
Out[756]: 
            x
2003-01-02  1
2003-01-03  2
2003-01-06  4

df.index
Out[757]: DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], 
          dtype='datetime64[ns]', freq=None)

Note that freq=None. If you apply asfreq('D'), this changes to freq='D':

df.asfreq('D')
Out[758]: 
              x
2003-01-02  1.0
2003-01-03  2.0
2003-01-04  NaN
2003-01-05  NaN
2003-01-06  4.0

df.asfreq('d').index
Out[759]: 
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-05',
               '2003-01-06'],
              dtype='datetime64[ns]', freq='D')

More generally, and depending on what exactly you are trying to do, you might want to check out the following for other options like reindex & resample: Add missing dates to pandas dataframe

Tetrafluoroethylene answered 14/9, 2017 at 12:6 Comment(1)
This pointed me into the right direction. What I did was to simply rebuild the broken index like this: df.index = pd.date_range(start=df.index[0], end=df.index[-1], freq="h")Calvaria
B
8

I'm not sure if earlier versions of python have this, but 3.6 has this simple solution:

# 'b' stands for business days
# 'w' for weekly, 'd' for daily, and you get the idea...
df.index.freq = 'b' 
Bosom answered 26/10, 2018 at 17:31 Comment(1)
for my index: DatetimeIndex(['2012-12-31', '2013-12-31', '2014-12-31', '2015-12-31', '2016-12-31', '2017-12-31', '2018-12-31', '2019-12-31', '2020-12-31', '2021-12-31', '2022-01-27'], dtype='datetime64[ns]', name='Date', freq=None) this produced nothing. Not sure whyIiette
E
3

It could happen if for examples the dates you are passing aren't sorted.

Look at this example:

example_ts = pd.Series(data=range(10),
                       index=pd.date_range('2020-01-01', '2020-01-10', freq='D'))
example_ts.index = pd.DatetimeIndex(np.hstack([example_ts.index[-1:],
                                               example_ts.index[:-1]]), freq='D')

The previous code goes into your error, because of the non-sequential dates.

example_ts = pd.Series(data=range(10),
                       index=pd.date_range('2020-01-01', '2020-01-10', freq='D'))
example_ts.index = pd.DatetimeIndex(np.hstack([example_ts.index[:-1],
                                               example_ts.index[-1:]]), freq='D')

This one runs correctly, instead.

Evieevil answered 27/4, 2021 at 14:52 Comment(0)
C
1

It seems to be an issue with missing values in the index. I have simply re-build the index based on the original index in the frequency I needed:

df.index = pd.date_range(start=df.index[0], end=df.index[-1], freq="h")
Calvaria answered 17/11, 2022 at 8:11 Comment(0)
H
0

Similar to some of the other answers here, my problem was that my data had missing dates.

Instead of dealing with this issue in Python, I opted to change my SQL query that I was using to source the data. So instead of skipping dates, I wrote the query such that it would fill in missing dates with the value 0.

Hagioscope answered 6/10, 2022 at 20:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.