How to convert Pandas Series of dates string into date objects?
Asked Answered
H

11

152

I have a Pandas data frame and one of the columns, have dates in string format of YYYY-MM-DD.

For e.g. : '2013-10-28'

At the moment the dtype of the column is object.

How do I convert the column values to Pandas date format?

Hylo answered 31/5, 2013 at 8:23 Comment(1)
Pandas has a native DATETIME type (datetime64); it doesn't have a native DATE dtype (any column containing DATE objects will be object dtype). It's much faster to work with datetime64 instead of dtype=object column of date objects. You can see here that something as simple as adding timedelta to timestamps is 100x faster on datetime64. If you really need DATE objects (e.g. need to dump the data into a database), then it's probably better to work on datetime64 while in pandas and convert to dates before storing the data elsewhere.Macon
J
138

Use astype

In [31]: df
Out[31]: 
   a        time
0  1  2013-01-01
1  2  2013-01-02
2  3  2013-01-03

In [32]: df['time'] = df['time'].astype('datetime64[ns]')

In [33]: df
Out[33]: 
   a                time
0  1 2013-01-01 00:00:00
1  2 2013-01-02 00:00:00
2  3 2013-01-03 00:00:00
Jaffna answered 31/5, 2013 at 8:36 Comment(7)
Nice - thank you - how do I get rid of the 00:00:00 at the end of each date?Hylo
The pandas timestamp have both date and time. Do you mean convert it into python date object?Jaffna
Yep, but I guess the python date object is not supported by Pandas.Hylo
You can convert it by df['time'] = [time.date() for time in df['time']]Jaffna
what does the [ns] mean, can you make the text string a date and remove the time part of that date?Plunkett
@Plunkett it's nanoseconds, and is the way the dates are stored under the hood once converted properly (epoch-time in nanoseconds).Bauske
Why use this over pandas.to_datetime() ?Rachealrachel
B
156

Essentially equivalent to @waitingkuo, but I would use pd.to_datetime here (it seems a little cleaner, and offers some additional functionality e.g. dayfirst):

In [11]: df
Out[11]:
   a        time
0  1  2013-01-01
1  2  2013-01-02
2  3  2013-01-03

In [12]: pd.to_datetime(df['time'])
Out[12]:
0   2013-01-01 00:00:00
1   2013-01-02 00:00:00
2   2013-01-03 00:00:00
Name: time, dtype: datetime64[ns]

In [13]: df['time'] = pd.to_datetime(df['time'])

In [14]: df
Out[14]:
   a                time
0  1 2013-01-01 00:00:00
1  2 2013-01-02 00:00:00
2  3 2013-01-03 00:00:00

Handling ValueErrors
If you run into a situation where doing

df['time'] = pd.to_datetime(df['time'])

Throws a

ValueError: Unknown string format

That means you have invalid (non-coercible) values. If you are okay with having them converted to pd.NaT, you can add an errors='coerce' argument to to_datetime:

df['time'] = pd.to_datetime(df['time'], errors='coerce')
Bauske answered 31/5, 2013 at 9:46 Comment(10)
Hi Guys, @AndyHayden can you remove the time part from the date? I don't need that part?Plunkett
In pandas' 0.13.1 the trailing 00:00:00s aren't displayed.Bauske
and what about in other versions, how do we remove / and or not display them?Plunkett
I don't think this can be done in a nice way, there is discussion to add date_format like float_format (which you've seen). I recommend upgrading anyway.Bauske
my problem is my date is in this format... 41516.43, and I get this error. I would expect it to return something like 2014-02-03 in the new column?! THE ERROR: #convert date values in the "load_date" column to dates budget_dataset['date_last_load'] = pd.to_datetime(budget_dataset['load_date']) budget_dataset -c:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_index,col_indexer] = value insteadPlunkett
@Plunkett if the date isn't being converted properly that sounds like a different problem. Could you post this as a separate question, tricky to diagnose here.Bauske
yeah its a numerical representation of a date, but it has two decimal places. Like 41516.426 is actually = 30/08/2013 10:13:26 AM, I'd like first to be able to return the (date and time), and then i'd like to split them into two columns.Plunkett
If more than 1 datetime columns conversion:df[['col1', 'col2']].apply(pd.to_datetime, axis=0)Fotina
@yoshiserry, is the date 41516.43 from Excel? This might help: https://mcmap.net/q/160046/-convert-excel-style-date-with-pandas/13608599Zanezaneski
Seems a better answer to me. Specially, because of controlling how the strings are parsed (dayfirst, yearfirst, etc.). With astype, doesn't seem to provide such control in a easy way.Humane
J
138

Use astype

In [31]: df
Out[31]: 
   a        time
0  1  2013-01-01
1  2  2013-01-02
2  3  2013-01-03

In [32]: df['time'] = df['time'].astype('datetime64[ns]')

In [33]: df
Out[33]: 
   a                time
0  1 2013-01-01 00:00:00
1  2 2013-01-02 00:00:00
2  3 2013-01-03 00:00:00
Jaffna answered 31/5, 2013 at 8:36 Comment(7)
Nice - thank you - how do I get rid of the 00:00:00 at the end of each date?Hylo
The pandas timestamp have both date and time. Do you mean convert it into python date object?Jaffna
Yep, but I guess the python date object is not supported by Pandas.Hylo
You can convert it by df['time'] = [time.date() for time in df['time']]Jaffna
what does the [ns] mean, can you make the text string a date and remove the time part of that date?Plunkett
@Plunkett it's nanoseconds, and is the way the dates are stored under the hood once converted properly (epoch-time in nanoseconds).Bauske
Why use this over pandas.to_datetime() ?Rachealrachel
H
47

I imagine a lot of data comes into Pandas from CSV files, in which case you can simply convert the date during the initial CSV read:

dfcsv = pd.read_csv('xyz.csv', parse_dates=[0]) where the 0 refers to the column the date is in.
You could also add , index_col=0 in there if you want the date to be your index.

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

You can also select column(s) to parse by name rather than position, e.g. parse_dates=['thedate']

Heighttopaper answered 19/3, 2014 at 4:15 Comment(1)
Thanks, that was exactly what I needed. The documentation has moved, though, you can find it here: pandas.pydata.org/pandas-docs/stable/reference/api/…Tungstic
S
30

Now you can do df['column'].dt.date

Note that for datetime objects, if you don't see the hour when they're all 00:00:00, that's not pandas. That's iPython notebook trying to make things look pretty.

Smriti answered 7/11, 2015 at 0:22 Comment(4)
This one does not work for me, it complains: Can only use .dt accessor with datetimelike valuesElinaelinor
you may have to do df[col] = pd.to_datetime(df[col]) first to convert your column to date time objects.Smriti
The issue with this answer is that it converts the column to dtype = object which takes up considerably more memory than a true datetime dtype in pandas.Calefactory
This person didn't say anything about the size of the dataset, or about parquet.Smriti
D
12

If you want to get the DATE and not DATETIME format:

df["id_date"] = pd.to_datetime(df["id_date"]).dt.date
Dermatoid answered 6/12, 2019 at 19:50 Comment(2)
I agree with @Asclepius. I've had a lot more issues with using .dt.date than just converting to datetime first.Distinguishing
we are getting following error: could not convert string to float: '12/12/2010'. How to solve it ?Microsporophyll
A
7

Another way to do this and this works well if you have multiple columns to convert to datetime.

cols = ['date1','date2']
df[cols] = df[cols].apply(pd.to_datetime)
Adrenalin answered 29/4, 2019 at 14:18 Comment(3)
Question ask for date not datetime.Haiku
@MarkAndersen aslong as you have date only values in your columns, convertion to datetime will retain pertaining information only. If you explicity convert using df['datetime_col'].dt.date that will result to an object dtype; loss in memory management.Fotina
There is no reason to use .apply here, considering a direct use of pd.to_datetime works.Dicta
F
2

It may be the case that dates need to be converted to a different frequency. In this case, I would suggest setting an index by dates.

#set an index by dates
df.set_index(['time'], drop=True, inplace=True)

After this, you can more easily convert to the type of date format you will need most. Below, I sequentially convert to a number of date formats, ultimately ending up with a set of daily dates at the beginning of the month.

#Convert to daily dates
df.index = pd.DatetimeIndex(data=df.index)

#Convert to monthly dates
df.index = df.index.to_period(freq='M')

#Convert to strings
df.index = df.index.strftime('%Y-%m')

#Convert to daily dates
df.index = pd.DatetimeIndex(data=df.index)

For brevity, I don't show that I run the following code after each line above:

print(df.index)
print(df.index.dtype)
print(type(df.index))

This gives me the following output:

Index(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='object', name='time')
object
<class 'pandas.core.indexes.base.Index'>

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03'], dtype='datetime64[ns]', name='time', freq=None)
datetime64[ns]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

PeriodIndex(['2013-01', '2013-01', '2013-01'], dtype='period[M]', name='time', freq='M')
period[M]
<class 'pandas.core.indexes.period.PeriodIndex'>

Index(['2013-01', '2013-01', '2013-01'], dtype='object')
object
<class 'pandas.core.indexes.base.Index'>

DatetimeIndex(['2013-01-01', '2013-01-01', '2013-01-01'], dtype='datetime64[ns]', freq=None)
datetime64[ns]
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
Fronton answered 17/10, 2017 at 21:30 Comment(0)
M
1

For the sake of completeness, another option, which might not be the most straightforward one, a bit similar to the one proposed by @SSS, but using rather the datetime library is:

import datetime
df["Date"] = df["Date"].apply(lambda x: datetime.datetime.strptime(x, '%Y-%d-%m').date())
Marieann answered 16/8, 2020 at 2:13 Comment(0)
C
0
 #   Column          Non-Null Count   Dtype         
---  ------          --------------   -----         
 0   startDay        110526 non-null  object
 1   endDay          110526 non-null  object

import pandas as pd

df['startDay'] = pd.to_datetime(df.startDay)

df['endDay'] = pd.to_datetime(df.endDay)

 #   Column          Non-Null Count   Dtype         
---  ------          --------------   -----         
 0   startDay        110526 non-null  datetime64[ns]
 1   endDay          110526 non-null  datetime64[ns]
Cithara answered 24/6, 2020 at 3:28 Comment(1)
No, this converts it to a 'datetime64[ns]' type not a 'date' type. Those are different things.Agc
W
0

You can use the pandas.to_datetime()

Wineglass answered 11/4, 2023 at 4:36 Comment(0)
S
-1

Try to convert one of the rows into timestamp using the pd.to_datetime function and then use .map to map the formular to the entire column

Storm answered 13/2, 2020 at 16:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.