Convert pandas DateTimeIndex to Unix Time?
Asked Answered
A

7

70

What is the idiomatic way of converting a pandas DateTimeIndex to (an iterable of) Unix Time? This is probably not the way to go:

[time.mktime(t.timetuple()) for t in my_data_frame.index.to_pydatetime()]
Artel answered 4/3, 2013 at 14:17 Comment(0)
H
125

As DatetimeIndex is ndarray under the hood, you can do the conversion without a comprehension (much faster).

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: from datetime import datetime

In [4]: dates = [datetime(2012, 5, 1), datetime(2012, 5, 2), datetime(2012, 5, 3)]
   ...: index = pd.DatetimeIndex(dates)
   ...: 
In [5]: index.astype(np.int64)
Out[5]: array([1335830400000000000, 1335916800000000000, 1336003200000000000], 
        dtype=int64)

In [6]: index.astype(np.int64) // 10**9
Out[6]: array([1335830400, 1335916800, 1336003200], dtype=int64)

%timeit [t.value // 10 ** 9 for t in index]
10000 loops, best of 3: 119 us per loop

%timeit index.astype(np.int64) // 10**9
100000 loops, best of 3: 18.4 us per loop
Hydromel answered 4/3, 2013 at 14:47 Comment(4)
I was annoyed I couldn't remember how to access this as an array, of course it's .as_type(int64) :)Duvalier
@AndyHayden -- well usually it's the other way around :)Hydromel
In case it's not clear, index.astype(np.int64) returns the time in nanoseconds, not seconds.Vetter
Is there a way to preserve NaT or NaN with this method? Else you get a large negative value.Keijo
D
47

Note: Timestamp is just unix time with nanoseconds (so divide it by 10**9):

[t.value // 10 ** 9 for t in tsframe.index]

For example:

In [1]: t = pd.Timestamp('2000-02-11 00:00:00')

In [2]: t
Out[2]: <Timestamp: 2000-02-11 00:00:00>

In [3]: t.value
Out[3]: 950227200000000000L

In [4]: time.mktime(t.timetuple())
Out[4]: 950227200.0

As @root points out it's faster to extract the array of values directly:

tsframe.index.astype(np.int64) // 10 ** 9
Duvalier answered 4/3, 2013 at 14:31 Comment(2)
this is quite embarrassingly easy… (and I could have sworn I tried t.value, turns out I only tried tsframe.index.value)Artel
@ChristianGeier It's only easy when you know the answer! It's crazy that tsframe.index.values is different... confusing.Duvalier
E
13

A summary of other answers:

df['<time_col>'].astype(np.int64) // 10**9

If you want to keep the milliseconds divide by 10**6 instead

Embowel answered 21/9, 2018 at 20:5 Comment(0)
M
2

Complementing the other answers: //10**9 will do a flooring divide, which gives full past seconds rather than the nearest value in seconds. A simple way to get more reasonable rounding, if that is desired, is to add 5*10**8 - 1 before doing the flooring divide.

Marseillaise answered 9/6, 2019 at 16:35 Comment(0)
H
2

To address the case of NaT, which above solutions will convert to large negative ints, in pandas>=0.24 a possible solution would be:

def datetime_to_epoch(ser):
    """Don't convert NaT to large negative values."""
    if ser.hasnans:
        res = ser.dropna().astype('int64').astype('Int64').reindex(index=ser.index)
    else:
        res = ser.astype('int64')

    return res // 10**9

In the case of missing values this will return the nullable int type 'Int64' (ExtensionType pd.Int64Dtype):

In [5]: dt = pd.to_datetime(pd.Series(["2019-08-21", "2018-07-28", np.nan]))                                                                                                                                                                                                    
In [6]: datetime_to_epoch(dt)                                                                                                                                                                                                                                                   
Out[6]: 
0    1566345600
1    1532736000
2           NaN
dtype: Int64

Otherwise a regular int64:

In [7]: datetime_to_epoch(dt[:2])                                                                                                                                                                                                                                               
Out[7]: 
0    1566345600
1    1532736000
dtype: int64
Hyperboloid answered 23/8, 2019 at 14:16 Comment(0)
L
1

The code from the other answers

dframe['datetime'].astype(np.int64) // 10**9

prints the following warning as of the time of my post:

FutureWarning: casting datetime64[ns] values to int64 with .astype(...) is deprecated and will raise in a future version. Use .view(...) instead.

So use the following instead:

dframe['datetime'].view(np.int64) // 10 ** 9
Leges answered 27/1, 2022 at 19:2 Comment(0)
F
0

If you have tried this on the datetime column of your dataframe:

dframe['datetime'].astype(np.int64) // 10**9

& that you are struggling with the following error:TypeError: int() argument must be a string, a bytes-like object or a number, not 'Timestamp' you can just use these two lines :

dframe.index = pd.DatetimeIndex(dframe['datetime'])
dframe['datetime']= dframe.index.astype(np.int64)// 10**9
Fisherman answered 13/9, 2019 at 14:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.