How to round a Pandas `DatetimeIndex`?
Asked Answered
B

4

14

I have a pandas.DatetimeIndex, e.g.:

pd.date_range('2012-1-1 02:03:04.000',periods=3,freq='1ms')
>>> [2012-01-01 02:03:04, ..., 2012-01-01 02:03:04.002000]

I would like to round the dates (Timestamps) to the nearest second. How do I do that? The expected result is similar to:

[2012-01-01 02:03:04.000000, ..., 2012-01-01 02:03:04.000000]

Is it possible to accomplish this by rounding a Numpy datetime64[ns] to seconds without changing the dtype [ns]?

np.array(['2012-01-02 00:00:00.001'],dtype='datetime64[ns]')
Bike answered 9/12, 2012 at 8:42 Comment(4)
@hayden, see my edit. I just want to round to the nearest second.Bike
date_range defaults to day frequency, I assume you meant to have pd.date_range('2012-1-1 00:00.000',periods=2, freq='S')Wuhan
@MattiJohn, see my correction. I meant the same second to repeat.Bike
Good solution hereMultiphase
T
16

Update: if you're doing this to a DatetimeIndex / datetime64 column a better way is to use np.round directly rather than via an apply/map:

np.round(dtindex_or_datetime_col.astype(np.int64), -9).astype('datetime64[ns]')

Old answer (with some more explanation):

Whilst @Matti's answer is clearly the correct way to deal with your situation, I thought I would add an answer how you might round a Timestamp to the nearest second:

from pandas.lib import Timestamp

t1 = Timestamp('2012-1-1 00:00:00')
t2 = Timestamp('2012-1-1 00:00:00.000333')

In [4]: t1
Out[4]: <Timestamp: 2012-01-01 00:00:00>

In [5]: t2
Out[5]: <Timestamp: 2012-01-01 00:00:00.000333>

In [6]: t2.microsecond
Out[6]: 333

In [7]: t1.value
Out[7]: 1325376000000000000L

In [8]: t2.value
Out[8]: 1325376000000333000L

# Alternatively: t2.value - t2.value % 1000000000
In [9]: long(round(t2.value, -9)) # round milli-, micro- and nano-seconds
Out[9]: 1325376000000000000L

In [10]: Timestamp(long(round(t2.value, -9)))
Out[10]: <Timestamp: 2012-01-01 00:00:00>

Hence you can apply this to the entire index:

def to_the_second(ts):
    return Timestamp(long(round(ts.value, -9)))

dtindex.map(to_the_second)
Teaspoon answered 9/12, 2012 at 14:28 Comment(4)
It seems like 1000000 should be replaced with 1000000000.Bike
Importantly, I would like to do it on a DatetimeIndex.Bike
@user1579844 you're right of course! I forgot about milli-seconds... whoops! I have corrected this and added how to apply this to the entire dt_index.Teaspoon
"Alternatively: t2.value - t2.value % 1000000000" This will round t2 down I believe, not to nearest.Offering
M
15

round() method was added for DatetimeIndex, Timestamp, TimedeltaIndex and Timedelta in pandas 0.18.0. Now we can do the following:

In[114]: index = pd.DatetimeIndex([pd.Timestamp('2012-01-01 02:03:04.000'), pd.Timestamp('2012-01-01 02:03:04.002'), pd.Timestamp('20130712 02:03:04.500'), pd.Timestamp('2012-01-01 02:03:04.501')])

In[115]: index.values
Out[115]: 
array(['2012-01-01T02:03:04.000000000', '2012-01-01T02:03:04.002000000',
       '2013-07-12T02:03:04.500000000', '2012-01-01T02:03:04.501000000'], dtype='datetime64[ns]')

In[116]: index.round('S')
Out[116]: 
DatetimeIndex(['2012-01-01 02:03:04', '2012-01-01 02:03:04',
               '2013-07-12 02:03:04', '2012-01-01 02:03:05'],
              dtype='datetime64[ns]', freq=None)

round() accepts frequency parameter. String aliases for it are listed here.

Monnet answered 3/9, 2016 at 19:2 Comment(0)
W
4

There is little point in changing the index itself - since you can just generate using date_range with the desired frequency parameter as in your question.

I assume what you are trying to do is change the frequency of a Time Series that contains data, in which case you can use resample (documentation). For example if you have the following time series:

dt_index = pd.date_range('2012-1-1 00:00.001',periods=3, freq='1ms')
ts = pd.Series(randn(3), index=dt_index)


2012-01-01 00:00:00           0.594618
2012-01-01 00:00:00.001000    0.874552
2012-01-01 00:00:00.002000   -0.700076
Freq: L

Then you can change the frequency to seconds using resample, specifying how you want to aggregate the values (mean, sum etc.):

ts.resample('S', how='sum')

2012-01-01 00:00:00    0.594618
2012-01-01 00:00:01    0.174475
Freq: S
Wuhan answered 9/12, 2012 at 14:17 Comment(2)
This eliminates some of the rows. I would like only to change the index value by rounding it to the nearest second.Bike
ah, sorry I hadn't realised you wanted duplicate values. It looks like @hayden's updated answer will do what you wantWuhan
U
3

For more general rounding, you can make use of the fact that Pandas Timestamp objects mostly use the standard library datetime.datetime API, including the datetime.datetime.replace() method.

So, to solve your microsecond rounding problem, you could do:

import datetime
import pandas as pd

times = pd.date_range('2012-1-1 02:03:04.499',periods=3,freq='1ms')
# Add 5e5 microseconds and truncate to simulate rounding
times_rounded = [(x + datetime.timedelta(microseconds=5e5)).replace(microsecond=0) for x in times]

from IPython.display import display
print('Before:')
display(list(times))
print('After:')
display(list(times_rounded))

Output:

Before:
[Timestamp('2012-01-01 02:03:04.499000', offset='L'),
 Timestamp('2012-01-01 02:03:04.500000', offset='L'),
 Timestamp('2012-01-01 02:03:04.501000', offset='L')]
After:
[Timestamp('2012-01-01 02:03:04', offset='L'),
 Timestamp('2012-01-01 02:03:05', offset='L'),
 Timestamp('2012-01-01 02:03:05', offset='L')]

You can use the same technique to, e.g., round to the nearest day (as long as you're not concerned about leap seconds and the like):

times = pd.date_range('2012-1-1 08:00:00', periods=3, freq='4H')
times_rounded = [(x + datetime.timedelta(hours=12)).replace(hour=0, second=0, microsecond=0) for x in times]

Inspired by this SO post: https://mcmap.net/q/791972/-round-timestamp-to-nearest-day-in-python

Unsuccessful answered 28/8, 2015 at 22:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.