How to convert Pandas Series of dates string into date objects?

Asked 31/5, 2013 at 8:23 Answered 11/4, 2023 at 4:36

Solved python pandas date time-series

152

I have a Pandas data frame and one of the columns, have dates in string format of YYYY-MM-DD.

For e.g. : '2013-10-28'

At the moment the dtype of the column is object.

How do I convert the column values to Pandas date format?

Hylo answered 31/5, 2013 at 8:23 Comment(1)

Pandas has a native DATETIME type (datetime64); it doesn't have a native DATE dtype (any column containing DATE objects will be object dtype). It's much faster to work with datetime64 instead of dtype=object column of date objects. You can see here that something as simple as adding timedelta to timestamps is 100x faster on datetime64. If you really need DATE objects (e.g. need to dump the data into a database), then it's probably better to work on datetime64 while in pandas and convert to dates before storing the data elsewhere. – Macon 15/4 at 20:32

138

Use astype

In [31]: df
Out[31]: 
   a        time
0  1  2013-01-01
1  2  2013-01-02
2  3  2013-01-03

In [32]: df['time'] = df['time'].astype('datetime64[ns]')

In [33]: df
Out[33]: 
   a                time
0  1 2013-01-01 00:00:00
1  2 2013-01-02 00:00:00
2  3 2013-01-03 00:00:00

Jaffna answered 31/5, 2013 at 8:36 Comment(7)

Nice - thank you - how do I get rid of the 00:00:00 at the end of each date? – Hylo 31/5, 2013 at 8:39

The pandas timestamp have both date and time. Do you mean convert it into python date object? – Jaffna 31/5, 2013 at 8:42

Yep, but I guess the python date object is not supported by Pandas. – Hylo 31/5, 2013 at 10:28

You can convert it by df['time'] = [time.date() for time in df['time']] – Jaffna 31/5, 2013 at 10:30

what does the [ns] mean, can you make the text string a date and remove the time part of that date? – Plunkett 13/3, 2014 at 23:40

@Plunkett it's nanoseconds, and is the way the dates are stored under the hood once converted properly (epoch-time in nanoseconds). – Bauske 14/3, 2014 at 1:35

Why use this over pandas.to_datetime() ? – Rachealrachel 18/2, 2020 at 23:57

156

Essentially equivalent to @waitingkuo, but I would use pd.to_datetime here (it seems a little cleaner, and offers some additional functionality e.g. dayfirst):

In [11]: df
Out[11]:
   a        time
0  1  2013-01-01
1  2  2013-01-02
2  3  2013-01-03

In [12]: pd.to_datetime(df['time'])
Out[12]:
0   2013-01-01 00:00:00
1   2013-01-02 00:00:00
2   2013-01-03 00:00:00
Name: time, dtype: datetime64[ns]

In [13]: df['time'] = pd.to_datetime(df['time'])

In [14]: df
Out[14]:
   a                time
0  1 2013-01-01 00:00:00
1  2 2013-01-02 00:00:00
2  3 2013-01-03 00:00:00

Handling ValueErrors
If you run into a situation where doing

df['time'] = pd.to_datetime(df['time'])

Throws a

ValueError: Unknown string format

That means you have invalid (non-coercible) values. If you are okay with having them converted to pd.NaT, you can add an errors='coerce' argument to to_datetime:

df['time'] = pd.to_datetime(df['time'], errors='coerce')

Bauske answered 31/5, 2013 at 9:46 Comment(10)

Hi Guys, @AndyHayden can you remove the time part from the date? I don't need that part? – Plunkett 14/3, 2014 at 1:31

In pandas' 0.13.1 the trailing 00:00:00s aren't displayed. – Bauske 14/3, 2014 at 1:33

and what about in other versions, how do we remove / and or not display them? – Plunkett 14/3, 2014 at 1:37

I don't think this can be done in a nice way, there is discussion to add date_format like float_format (which you've seen). I recommend upgrading anyway. – Bauske 14/3, 2014 at 1:39

my problem is my date is in this format... 41516.43, and I get this error. I would expect it to return something like 2014-02-03 in the new column?! THE ERROR: #convert date values in the "load_date" column to dates budget_dataset['date_last_load'] = pd.to_datetime(budget_dataset['load_date']) budget_dataset -c:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_index,col_indexer] = value instead – Plunkett 14/3, 2014 at 2:7

@Plunkett if the date isn't being converted properly that sounds like a different problem. Could you post this as a separate question, tricky to diagnose here. – Bauske 14/3, 2014 at 2:17

yeah its a numerical representation of a date, but it has two decimal places. Like 41516.426 is actually = 30/08/2013 10:13:26 AM, I'd like first to be able to return the (date and time), and then i'd like to split them into two columns. – Plunkett 14/3, 2014 at 2:20

If more than 1 datetime columns conversion:df[['col1', 'col2']].apply(pd.to_datetime, axis=0) – Fotina 10/8, 2020 at 13:41

@yoshiserry, is the date 41516.43 from Excel? This might help: https://mcmap.net/q/160046/-convert-excel-style-date-with-pandas/13608599 – Zanezaneski 16/8, 2020 at 7:15

Seems a better answer to me. Specially, because of controlling how the strings are parsed (dayfirst, yearfirst, etc.). With astype, doesn't seem to provide such control in a easy way. – Humane 27/10, 2023 at 16:54

138

Use astype

In [31]: df
Out[31]: 
   a        time
0  1  2013-01-01
1  2  2013-01-02
2  3  2013-01-03

In [32]: df['time'] = df['time'].astype('datetime64[ns]')

In [33]: df
Out[33]: 
   a                time
0  1 2013-01-01 00:00:00
1  2 2013-01-02 00:00:00
2  3 2013-01-03 00:00:00

Jaffna answered 31/5, 2013 at 8:36 Comment(7)

Nice - thank you - how do I get rid of the 00:00:00 at the end of each date? – Hylo 31/5, 2013 at 8:39

The pandas timestamp have both date and time. Do you mean convert it into python date object? – Jaffna 31/5, 2013 at 8:42

Yep, but I guess the python date object is not supported by Pandas. – Hylo 31/5, 2013 at 10:28

You can convert it by df['time'] = [time.date() for time in df['time']] – Jaffna 31/5, 2013 at 10:30

what does the [ns] mean, can you make the text string a date and remove the time part of that date? – Plunkett 13/3, 2014 at 23:40

@Plunkett it's nanoseconds, and is the way the dates are stored under the hood once converted properly (epoch-time in nanoseconds). – Bauske 14/3, 2014 at 1:35

Why use this over pandas.to_datetime() ? – Rachealrachel 18/2, 2020 at 23:57

I imagine a lot of data comes into Pandas from CSV files, in which case you can simply convert the date during the initial CSV read:

dfcsv = pd.read_csv('xyz.csv', parse_dates=[0]) where the 0 refers to the column the date is in.
You could also add , index_col=0 in there if you want the date to be your index.

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

You can also select column(s) to parse by name rather than position, e.g. parse_dates=['thedate']

Heighttopaper answered 19/3, 2014 at 4:15 Comment(1)

Thanks, that was exactly what I needed. The documentation has moved, though, you can find it here: pandas.pydata.org/pandas-docs/stable/reference/api/… – Tungstic 30/1, 2020 at 7:40

Now you can do df['column'].dt.date

Note that for datetime objects, if you don't see the hour when they're all 00:00:00, that's not pandas. That's iPython notebook trying to make things look pretty.

Smriti answered 7/11, 2015 at 0:22 Comment(4)

This one does not work for me, it complains: Can only use .dt accessor with datetimelike values – Elinaelinor 23/5, 2018 at 15:26

you may have to do df[col] = pd.to_datetime(df[col]) first to convert your column to date time objects. – Smriti 24/5, 2018 at 23:42

The issue with this answer is that it converts the column to dtype = object which takes up considerably more memory than a true datetime dtype in pandas. – Calefactory 10/1, 2019 at 15:19

This person didn't say anything about the size of the dataset, or about parquet. – Smriti 17/2, 2021 at 17:41

If you want to get the DATE and not DATETIME format:

df["id_date"] = pd.to_datetime(df["id_date"]).dt.date

Dermatoid answered 6/12, 2019 at 19:50 Comment(2)

I agree with @Asclepius. I've had a lot more issues with using .dt.date than just converting to datetime first. – Distinguishing 14/9, 2023 at 15:5

we are getting following error: could not convert string to float: '12/12/2010'. How to solve it ? – Microsporophyll 15/4 at 16:32

Another way to do this and this works well if you have multiple columns to convert to datetime.

cols = ['date1','date2']
df[cols] = df[cols].apply(pd.to_datetime)

Adrenalin answered 29/4, 2019 at 14:18 Comment(3)

Question ask for date not datetime. – Haiku 3/7, 2019 at 14:34

@MarkAndersen aslong as you have date only values in your columns, convertion to datetime will retain pertaining information only. If you explicity convert using df['datetime_col'].dt.date that will result to an object dtype; loss in memory management. – Fotina 10/8, 2020 at 13:33

There is no reason to use .apply here, considering a direct use of pd.to_datetime works. – Dicta 14/2, 2021 at 18:51

It may be the case that dates need to be converted to a different frequency. In this case, I would suggest setting an index by dates.

#set an index by dates
df.set_index(['time'], drop=True, inplace=True)

After this, you can more easily convert to the type of date format you will need most. Below, I sequentially convert to a number of date formats, ultimately ending up with a set of daily dates at the beginning of the month.

#Convert to daily dates
df.index = pd.DatetimeIndex(data=df.index)

#Convert to monthly dates
df.index = df.index.to_period(freq='M')

#Convert to strings
df.index = df.index.strftime('%Y-%m')

#Convert to daily dates
df.index = pd.DatetimeIndex(data=df.index)

For brevity, I don't show that I run the following code after each line above:

print(df.index)
print(df.index.dtype)
print(type(df.index))

This gives me the following output:

Index(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='object', name='time')
object
<class 'pandas.core.indexes.base.Index'>

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='datetime64[ns]', name='time', freq=None)
datetime64[ns]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

PeriodIndex(['2013-01', '2013-01', '2013-01'], dtype='period[M]', name='time', freq='M')
period[M]
<class 'pandas.core.indexes.period.PeriodIndex'>

Index(['2013-01', '2013-01', '2013-01'], dtype='object')
object
<class 'pandas.core.indexes.base.Index'>

DatetimeIndex(['2013-01-01', '2013-01-01', '2013-01-01'], dtype='datetime64[ns]', freq=None)
datetime64[ns]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

Fronton answered 17/10, 2017 at 21:30 Comment(0)

For the sake of completeness, another option, which might not be the most straightforward one, a bit similar to the one proposed by @SSS, but using rather the datetime library is:

import datetime
df["Date"] = df["Date"].apply(lambda x: datetime.datetime.strptime(x, '%Y-%d-%m').date())

Marieann answered 16/8, 2020 at 2:13 Comment(0)

 #   Column          Non-Null Count   Dtype         
---  ------          --------------   -----         
 0   startDay        110526 non-null  object
 1   endDay          110526 non-null  object

import pandas as pd

df['startDay'] = pd.to_datetime(df.startDay)

df['endDay'] = pd.to_datetime(df.endDay)

 #   Column          Non-Null Count   Dtype         
---  ------          --------------   -----         
 0   startDay        110526 non-null  datetime64[ns]
 1   endDay          110526 non-null  datetime64[ns]

Cithara answered 24/6, 2020 at 3:28 Comment(1)

No, this converts it to a 'datetime64[ns]' type not a 'date' type. Those are different things. – Agc 23/3, 2021 at 22:20

You can use the pandas.to_datetime()

Wineglass answered 11/4, 2023 at 4:36 Comment(0)

-1

Try to convert one of the rows into timestamp using the pd.to_datetime function and then use .map to map the formular to the entire column

Storm answered 13/2, 2020 at 16:59 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags