How can I convert my datetime column in pandas all to the same timezone
Asked Answered
A

4

34

I have a dataframe with a DataTime column (with Timezone in different formats). It appears like timezone is UTC but I want to convert the column to pd.to_datetime and that is failing. That is problem #1. Since that fails I cannot do any datetime operations on the time period such as group the column by date / figure out the days / group by hour of the day and so on. Here's my dataframe df_res

    DateTime
    2017-11-02 19:49:28-07:00
    2017-11-27 07:32:22-08:00
    2017-12-27 17:01:15-08:00

OUTPUT for the command

      df_res["DateTime"] = df_res["DateTime"].dt.tz_convert('America/New_York')

AttributeError: Can only use .dt accessor with datetimelike values

WHen I convert to datetime

   df_res['DateTime'] = pd.to_datetime(df_res['DateTime'])

ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

I feel I am going around in circles. I need to convert the column to datetime in order to perform operations & in order to do that I need to have them all the same timezone but I cannot have the same timezone unless it is a datetime object so how can I best approach this. I did refer to previous postings but they seem to convert to datetime as easily as possible:

Convert datetime columns to a different timezone pandas Convert pandas timezone-aware DateTimeIndex to naive timestamp, but in certain timezone

Assiduity answered 27/3, 2019 at 19:57 Comment(2)
How are you creating the "DateTime" column values in the first place?Maliamalice
I extract the datetime field from a json fileAssiduity
C
41

I think that it is not necessary to apply lambdas:

df_res['DateTime'] = pd.to_datetime(df_res['DateTime'], utc=True)

documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html

Can answered 6/5, 2020 at 18:18 Comment(3)
This would only work if your datetimes are actually in UTC, right?Schulte
This was seriously helpful line of code for me. Much appreciatedHappiness
This works even when the input datetimes are not in UTC. The utc argument appears to refer only to the output timezone. I see that non-UTC inputs are converted correctly.Trounce
C
7

You can check this:

df = pd.DataFrame({
    'time': [
        '2017-11-02 19:49:28-08:00', 
        '2017-11-27 07:32:22-07:00', 
        '2017-12-27 17:01:15-07:00'
    ]
})

df['time'] = pd.to_datetime(df['time'])

df['time'].apply(lambda x: pd.to_datetime(x).tz_localize('US/Eastern'))
0   2017-11-03 03:49:28-04:00
1   2017-11-27 14:32:22-05:00
2   2017-12-28 00:01:15-05:00
Name: time, dtype: datetime64[ns, US/Eastern]
Counterpunch answered 27/3, 2019 at 20:44 Comment(5)
Thanks .. And what if my dataframe has over a 10k+ datetime entries. The DateTime is of type object too and I need to convert them all?Assiduity
@py_noob: Maybe too late for you, I've met the same problem that after converting to datetime format, the column is of type object. However, when I check each line, they are all in datetime format. Its strange but I think it's not a problem, isn't it?Speaks
In my case I needed to pass a utf=True arg to pd.to_datetime(), as in df['time'] = pd.to_datetime(df['time'], utf=True)Fideicommissary
Copy paste your code and throw TypeError: Cannot localize tz-aware Timestamp, use tz_convert for conversionsLactone
apply was the solution for me to transform a series with mixed time zones into a series of local, naive times.Jegger
D
1

Posting here because I spent few hours to figure out the answer to the title of this question for the general case where you might have the naive datetimes on another timezone than UTC.

Here is the solution I come up with. I'd happy if someone with a deeper understanding of pandas/numpy can point out if there is any way to improve its performance. Though, I might come handy as a starting point for someone with a similar issue as it is.

from datetime import datetime

import pandas as pd
from pandas import Series
from pandas.api.types import is_datetime64_any_dtype as is_datetime


def ensure_datetime(series: Series, timezone: str):
    """
    Ensures that the `series` is a datetime series of dtype datetime64[ns, timezone]

    - Convert tz aware values to `timezone`.
    - Assume naive values are on `timezone` and make them aware.
    - Handle None values and convert them to NaT (so we can accomplish the dtype requirement).
    """
    if series.dtype == pd.DatetimeTZDtype(tz=timezone):
        return series

    are_datetime = series.apply(lambda x: isinstance(x, datetime)).astype(bool)

    # Convert only values that are not already datetime, otherwise if there are
    # tz-aware values pandas will raise: Tz-aware datetime.datetime cannot
    # be converted to datetime64 unless utc=True.
    # We cannot set utc=True because pandas will assume naive values to be on UTC
    # but we need naive values to be considered on `timezone`.
    series = series.mask(
        ~are_datetime, pd.to_datetime(series[~are_datetime], errors="coerce")
    )

    # Localize naive values to `timezone`
    are_unaware = series.apply(lambda x: not pd.isna(x) and x.tzinfo is None).astype(
        bool
    )
    series = series.mask(
        are_unaware, pd.to_datetime(series[are_unaware]).dt.tz_localize(timezone)
    )

    # Now that we don't have any naive value we can normalize all to UTC and
    # then convert to `timezone`.
    series = pd.to_datetime(series, utc=True).dt.tz_convert(timezone)

    return series

def test_ensure_datetime():
    series = pd.Series(
        ["2022-12-31 16:00:00-08:00", "2023-01-01", "2023-01-01 12:30", None]
    )

    series = ensure_datetime(series, "America/New_York")

    assert is_datetime(series)
    assert list(series) == [
        pd.Timestamp("2022-12-31 19:00", tz="America/New_York"),
        pd.Timestamp("2023-01-01 00:00", tz="America/New_York"),
        pd.Timestamp("2023-01-01 12:30", tz="America/New_York"),
        pd.NaT,
    ]

    series = ensure_datetime(series.dt.date, "America/New_York")

    assert is_datetime(series)
    assert list(series) == [
        pd.Timestamp("2022-12-31 00:00", tz="America/New_York"),
        pd.Timestamp("2023-01-01 00:00", tz="America/New_York"),
        pd.Timestamp("2023-01-01 00:00", tz="America/New_York"),
        pd.NaT,
    ]

    # Mix aware timestamps with naive
    series = pd.Series(
        [
            pd.Timestamp("2022-12-31 12:00", tz="America/New_York"),
            pd.Timestamp("2022-12-31 12:00"),
        ]
    )
    series = ensure_datetime(series, "America/New_York")
    assert list(series) == [
        pd.Timestamp("2022-12-31 12:00", tz="America/New_York"),
        pd.Timestamp("2022-12-31 12:00", tz="America/New_York"),
    ]
Diorio answered 31/8, 2023 at 1:58 Comment(0)
S
0

I am unsure if this problem still exists but I came across a simple solution:

df['UTC Time'] = df['time'].apply(lambda x: pd.to_datetime(x).tz_convert('UTC'))
Sagerman answered 29/3 at 0:46 Comment(1)
Welcome. Though this looks like an inefficient (apply) version of the accepted answer and a duplicate of another answerCaryncaryo

© 2022 - 2024 — McMap. All rights reserved.