Convert Pandas Column to DateTime

Asked 5/11, 2014 at 17:24 Answered 29/1, 2023 at 18:39

Solved python pandas dataframe date datetime

483

I have one field in a pandas DataFrame that was imported as string format.

It should be a datetime variable. How do I convert it to a datetime column, and then filter based on date?

Example:

raw_data = pd.DataFrame({'Mycol': ['05SEP2014:00:00:00.000']})

Curator answered 5/11, 2014 at 17:24 Comment(0)

820

Use the to_datetime function, specifying a format to match your data.

df['Mycol'] = pd.to_datetime(df['Mycol'], format='%d%b%Y:%H:%M:%S.%f')

Penicillin answered 5/11, 2014 at 17:50 Comment(3)

Note: the format argument isn't required. to_datetime is smart. Go ahead and try it without trying to match your data. – Woodrowwoodruff 22/4, 2017 at 18:54

format is not required but passing it makes the conversion run much, much faster. See this answer for more info. – Mowry 29/1, 2023 at 18:45

More correctly, in the case of the OP, format is required, otherwise DateParseError occurs. pandas can infer some string formats, but, as point out, using format greatly improves performance. – Slavey 26/5, 2023 at 15:4

110

If you have more than one column to be converted you can do the following:

df[["col1", "col2", "col3"]] = df[["col1", "col2", "col3"]].apply(pd.to_datetime)

Sparling answered 17/3, 2019 at 13:52 Comment(1)

If you have different datetime formats in these columns, you can try using format parameter like: apply(pd.to_datetime, format='mixed') – Underpass 12/7, 2023 at 16:47

edit: recommending to use pd.to_datetime() instead of this because .apply() is generally slower.

You can use the DataFrame method .apply() to operate on the values in Mycol:

>>> df = pd.DataFrame(['05SEP2014:00:00:00.000'], columns=['Mycol'])
>>> df
                    Mycol
0  05SEP2014:00:00:00.000
>>> import datetime as dt
>>> df['Mycol'] = df['Mycol'].apply(lambda x: 
...     dt.datetime.strptime(x, '%d%b%Y:%H:%M:%S.%f'))
>>> df
       Mycol
0 2014-09-05

Asch answered 5/11, 2014 at 17:51 Comment(2)

Why use this over pd.to_datetime? – Casseycassi 24/5, 2023 at 1:36

i probably hadn't yet seen pd.to_datetime when i wrote this. added a recommendation to use pd.to_datetime. thanks for the comment. – Asch 26/5, 2023 at 5:30

Use the pandas to_datetime function to parse the column as DateTime. Also, by using infer_datetime_format=True, it will automatically detect the format and convert the mentioned column to DateTime.

import pandas as pd
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], infer_datetime_format=True)

Pudendum answered 23/9, 2019 at 10:30 Comment(0)

Time Saver:

raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'])

Jahnke answered 29/10, 2021 at 16:44 Comment(1)

This doesn't work for this specific use case. It gives a ParserError: Unknown string format: 05SEP2014:00:00:00.000. – Format 5/10, 2022 at 10:30

Setting the correct `format=` is much faster than letting pandas find out¹

Long story short, passing the correct format= from the beginning as in chrisb's post is much faster than letting pandas figure out the format, especially if the format contains time component. The runtime difference for dataframes greater than 10k rows is huge (~25 times faster, so we're talking like a couple minutes vs a few seconds). All valid format options can be found at https://strftime.org/.

`errors='coerce'` is useful

If some rows are not in the correct format or not datetime at all, errors= parameter is very useful, so that you can convert the valid rows and handle the rows that contained invalid values later.

df['date'] = pd.to_datetime(
    df['date'], format='%d%b%Y:%H:%M:%S.%f', errors='coerce')

# for multiple columns
df[['start', 'end']] = df[['start', 'end']].apply(
    pd.to_datetime, format='%d%b%Y:%H:%M:%S.%f', errors='coerce')

To silence `SettingWithCopyWarning`

On a side note, if you got this warning, then that means your dataframe was probably created by filtering another dataframe. Enable copy-on-write and you're good to go. (see this post for more about it).

pd.set_option('copy_on_write', True)
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y:%H:%M:%S.%f')

¹ Code used to produce the timeit test plot.

import perfplot
from random import choices
from datetime import datetime

mdYHMSf = range(1,13), range(1,29), range(2000,2024), range(24), *[range(60)]*2, range(1000)
perfplot.show(
    kernels=[lambda x: pd.to_datetime(x), 
             lambda x: pd.to_datetime(x, format='%m/%d/%Y %H:%M:%S.%f'), 
             lambda x: pd.to_datetime(x, infer_datetime_format=True),
             lambda s: s.apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))],
    labels=["pd.to_datetime(df['date'])", 
            "pd.to_datetime(df['date'], format='%m/%d/%Y %H:%M:%S.%f')", 
            "pd.to_datetime(df['date'], infer_datetime_format=True)", 
            "df['date'].apply(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M:%S.%f'))"],
    n_range=[2**k for k in range(20)],
    setup=lambda n: pd.Series([f"{m}/{d}/{Y} {H}:{M}:{S}.{f}" 
                               for m,d,Y,H,M,S,f in zip(*[choices(e, k=n) for e in mdYHMSf])]),
    equality_check=pd.Series.equals,
    xlabel='len(df)'
)

If the column contains multiple formats, see Convert a column of mixed format strings to a datetime Dtype.

Mowry answered 29/1, 2023 at 18:39 Comment(2)

If I use pd.set_option('copy_on_write', True) I get a KeyError for the column I want to convert. Do you have any idea why? – Forecastle 20/3 at 9:59

@Forecastle without looking at your specific code, I can’t say with certainty but KeyError usually occurs when there is a misspelling (“Date” vs “date”) or if the wrong axis is indexed or if there is no column with that name in the first place. – Mowry 20/3 at 15:35

-2

Just like we convert object data type to float or int, use astype().

raw_data['Mycol'] = raw_data['Mycol'].astype('datetime64[ns]')

Treasonable answered 23/8, 2022 at 8:12 Comment(1)

This doesn't work for this specific use case. It gives a ParserError: Unknown string format: 05SEP2014:00:00:00.000. – Format 5/10, 2022 at 10:28

Setting the correct `format=` is much faster than letting pandas find out¹

`errors='coerce'` is useful

To silence `SettingWithCopyWarning`

Recommended topics

Hot tags

Setting the correct format= is much faster than letting pandas find out1

errors='coerce' is useful

To silence SettingWithCopyWarning

Recommended topics

Hot tags

Setting the correct `format=` is much faster than letting pandas find out¹

`errors='coerce'` is useful

To silence `SettingWithCopyWarning`