import polars as pl
df = pl.DataFrame(
{"name": list("abcdef"), "age": [21, 31, 32, 53, 45, 26], "country": list("AABBBC")}
)
df.group_by("country").agg(
pl.col("name").sort_by("age").first().alias("age_sort_1"),
pl.col("name").sort_by("age").get(2).alias("age_sort_2"), # OutOfBoundsError: index out of bounds
# pl.col("name").sort_by("age").arr.get(2, null_on_oob=True).alias("age_2"),
# SchemaError: invalid series dtype: expected `FixedSizeList`, got `str`
pl.col("name").sort_by("age").last().alias("age_sort_-1")
)
As shown in the code above, I want to get the name in each country whose age is in a specific order.
However, Expr.get
does not provide the null_on_oob parameter. How to automatically fill in null when an out-of-bounds situation occurs?
In addition, the .arr.get
method provides the null_on_oob parameter
, but reports an error SchemaError: invalid series dtype: expected "FixedSizeList", got "str".
I don’t know what this error refers to and how to solve it.
ps: The above code uses the repeated code pl.col("name").sort_by("age")
many times. Is there a more concise method?
.group_by
it will be a list by default, so an additional.get(0)
is needed. – Fiann