How to convert time durations to numeric in polars?
Asked Answered
V

1

13

Is there any built-in function in polars or a better way to convert time durations to numeric by defining the time resolution (e.g.: days, hours, minutes)?

import polars as pl

df = pl.DataFrame({
    "from": ["2023-01-01", "2023-01-02", "2023-01-03"],
    "to": ["2023-01-04", "2023-01-05", "2023-01-06"],
})

My current approach:

# Convert to date and calculate the time difference
df = (
    df.with_columns(
        pl.col("to", "from").str.to_date().name.suffix("_date")
    )
    .with_columns((pl.col("to_date") - pl.col("from_date")).alias("time_diff"))
)

# Convert the time difference to int (in days)
df = df.with_columns(
    ((pl.col("time_diff") / (24 * 60 * 60 * 1000)).cast(pl.Int8)).alias("time_diff_int")
)

Output:

shape: (3, 6)
┌────────────┬────────────┬────────────┬────────────┬──────────────┬───────────────┐
│ from       ┆ to         ┆ to_date    ┆ from_date  ┆ time_diff    ┆ time_diff_int │
│ ---        ┆ ---        ┆ ---        ┆ ---        ┆ ---          ┆ ---           │
│ str        ┆ str        ┆ date       ┆ date       ┆ duration[ms] ┆ i8            │
╞════════════╪════════════╪════════════╪════════════╪══════════════╪═══════════════╡
│ 2023-01-01 ┆ 2023-01-04 ┆ 2023-01-04 ┆ 2023-01-01 ┆ 3d           ┆ 3             │
│ 2023-01-02 ┆ 2023-01-05 ┆ 2023-01-05 ┆ 2023-01-02 ┆ 3d           ┆ 3             │
│ 2023-01-03 ┆ 2023-01-06 ┆ 2023-01-06 ┆ 2023-01-03 ┆ 3d           ┆ 3             │
└────────────┴────────────┴────────────┴────────────┴──────────────┴───────────────┘
Vigen answered 13/2, 2023 at 15:46 Comment(0)
A
7

the dt accessor lets you obtain individual components, is that what you're looking for?

df["time_diff"].dt.days()
Series: 'time_diff' [i64]
[
    3
    3
    3
]

df["time_diff"].dt.hours()
Series: 'time_diff' [i64]
[
    72
    72
    72
]

df["time_diff"].dt.minutes()
Series: 'time_diff' [i64]
[
    4320
    4320
    4320
]

docs: API reference, series/timeseries

Armitage answered 13/2, 2023 at 16:1 Comment(4)
Is it possible to get years? Except the obviosus View upvote and downvote totals. the dt accessor lets you obtain individual components, is that what you're looking for? df["time_diff"].dt.days() / 365 ?Decimate
@Björn I don't fully understand your comment; are you asking how to get years or are you suggesting a solution to this? In general, note that 'year' is an ambiguous duration; not all years have 365 days.Armitage
The former, I was wondering if I substract two date objects and want to get the resulting pl.Duration in years, what would be the best way to obtain this. I guess a rough approximation with / 365 is good enough in most cases. Because you are absolutely correct that year is ambiguous (if you want to have an extremly high precision)Decimate
Yeah it gets ambiguous starting with month, so anything below should be covered by the duration type. Anything above: you're on your own afaik. 365 or 365.25 days should be ok in most cases to get fractional years.Armitage

© 2022 - 2025 — McMap. All rights reserved.