I am new to Polars and I am not sure whether I am using .with_columns()
correctly.
Here's a situation I encounter frequently:
There's a dataframe and in .with_columns()
, I apply some operation to a column. For example, I convert some dates from str
to date
type and then want to compute the duration between start and end date. I'd implement this as follows.
import polars as pl
pl.DataFrame(
{
"start": ["01.01.2019", "01.01.2020"],
"end": ["11.01.2019", "01.05.2020"],
}
).with_columns(
pl.col("start").str.to_date(),
pl.col("end").str.to_date(),
).with_columns(
(pl.col("end") - pl.col("start")).alias("duration"),
)
First, I convert the two columns, next I call .with_columns()
again.
Something shorter like this does not work:
pl.DataFrame(
{
"start": ["01.01.2019", "01.01.2020"],
"end": ["11.01.2019", "01.05.2020"],
}
).with_columns(
pl.col("start").str.to_date(),
pl.col("end").str.to_date(),
(pl.col("end") - pl.col("start")).alias("duration"),
)
# InvalidOperationError: sub operation not supported for dtypes `str` and `str`
Is there a way to avoid calling .with_columns()
twice and to write this in a more compact way?
fmt
argument is slightly different forstart
andend
in the actual data I use, but I'll keep the suggestions in mind :) – Costplus