how to get the shifted index value of a dataframe in Pandas?
Asked Answered
J

3

7

Consider the simple example below:

date = pd.date_range('1/1/2011', periods=5, freq='H')

df = pd.DataFrame({'cat' : ['A', 'A', 'A', 'B',
                         'B']}, index = date)
df
Out[278]: 
                    cat
2011-01-01 00:00:00   A
2011-01-01 01:00:00   A
2011-01-01 02:00:00   A
2011-01-01 03:00:00   B
2011-01-01 04:00:00   B

I want to create a variable that contains the lagged/lead value of the index. That is something like:

df['index_shifted']=df.index.shift(1)

So, for instance, at time 2011-01-01 01:00:00 I expect the variable index_shifted to be 2011-01-01 00:00:00

How can I do that? Thanks!

Janaye answered 14/6, 2016 at 19:2 Comment(1)
df.index gives etc etc '2016-06-13 16:29:00'], dtype='datetime64[ns]', length=2471070, freq=None) Is this a problem?Polyandrist
M
10

I think you need Index.shift with -1:

df['index_shifted']= df.index.shift(-1)
print (df)
                    cat       index_shifted
2011-01-01 00:00:00   A 2010-12-31 23:00:00
2011-01-01 01:00:00   A 2011-01-01 00:00:00
2011-01-01 02:00:00   A 2011-01-01 01:00:00
2011-01-01 03:00:00   B 2011-01-01 02:00:00
2011-01-01 04:00:00   B 2011-01-01 03:00:00

For me it works without freq, but maybe it is necessary in real data:

df['index_shifted']= df.index.shift(-1, freq='H')
print (df)
                    cat       index_shifted
2011-01-01 00:00:00   A 2010-12-31 23:00:00
2011-01-01 01:00:00   A 2011-01-01 00:00:00
2011-01-01 02:00:00   A 2011-01-01 01:00:00
2011-01-01 03:00:00   B 2011-01-01 02:00:00
2011-01-01 04:00:00   B 2011-01-01 03:00:00

EDIT:

If freq of DatetimeIndex is None, you need add freq to shift:

import pandas as pd

date = pd.date_range('1/1/2011', periods=5, freq='H').union(pd.date_range('5/1/2011', periods=5, freq='H'))


df = pd.DataFrame({'cat' : ['A', 'A', 'A', 'B',
                         'B','A', 'A', 'A', 'B',
                         'B']}, index = date)

print (df.index)
DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',
               '2011-01-01 02:00:00', '2011-01-01 03:00:00',
               '2011-01-01 04:00:00', '2011-05-01 00:00:00',
               '2011-05-01 01:00:00', '2011-05-01 02:00:00',
               '2011-05-01 03:00:00', '2011-05-01 04:00:00'],
              dtype='datetime64[ns]', freq=None)

df['index_shifted']= df.index.shift(-1, freq='H')
print (df)
                    cat       index_shifted
2011-01-01 00:00:00   A 2010-12-31 23:00:00
2011-01-01 01:00:00   A 2011-01-01 00:00:00
2011-01-01 02:00:00   A 2011-01-01 01:00:00
2011-01-01 03:00:00   B 2011-01-01 02:00:00
2011-01-01 04:00:00   B 2011-01-01 03:00:00
2011-05-01 00:00:00   A 2011-04-30 23:00:00
2011-05-01 01:00:00   A 2011-05-01 00:00:00
2011-05-01 02:00:00   A 2011-05-01 01:00:00
2011-05-01 03:00:00   B 2011-05-01 02:00:00
2011-05-01 04:00:00   B 2011-05-01 03:00:00
Mornings answered 14/6, 2016 at 19:5 Comment(0)
T
2

What's wrong with df['index_shifted']=df.index.shift(-1)?

(Genuine question, not sure if I missed something)

Tame answered 14/6, 2016 at 19:4 Comment(3)
I get ValueError: Cannot shift with no freqPolyandrist
@Noobie: weird, it works fine locally. What's the result of df.index.freq? It should be <Hour>.Tame
please look at my comment abovePolyandrist
H
1

This is an old question, but if your timestamps have gaps or you do not want to specify the frequency, AND you are not dealing with timezones the following will work:

df['index_shifted'] = pd.Series(df.index).shift(-1).values

If you are dealing with Timezones the following will work:

df['index_shifted'] = pd.to_datetime(pd.Series(df.index).shift(-1).values, utc=True).tz_convert('America/New_York')
Heavierthanair answered 15/12, 2022 at 6:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.