Here's a way to get both the max and the column name at once in a using a fold
(which is how the horizontal functions work behind the scenes anyway).
df.lazy().with_columns(
max_col = (max_struct:=pl.fold(
acc=pl.struct(value=-1e20, name=pl.lit("default low value")),
function = lambda x,y: (
pl.when(x.struct.field('value')>y)
.then(x)
.otherwise(pl.struct(value=y, name=pl.lit(y.name)))),
exprs=pl.all()
)).struct.field('name'),
max_value=max_struct.struct.field('value')
).collect()
A reduce
is an operation that will take two columns at a time pass them to a function and take that output. If there are more than two columns it takes that output to the to the next column, and so on. A fold
is the same thing except that it starts with an acc
umulator which is just a default first column. In that way the first pair is the accumulator and the actual first column but after that it's the same.
The fold gets Series which are themselves named so we can simply make it return a struct of the name of the Series that is bigger along with the bigger value so that we get both outputs at once.
In the above I use the walrus operator so that we can take apart the struct in the context that we create it. I make the df lazy so that it will cache the result to a CSE rather than doing it twice.
Performance comparison
Starting with
n=int(1e6)
df=pl.DataFrame({
'a':np.random.uniform(-5,5,n),
'b':np.random.uniform(-5,5,n),
'c':np.random.uniform(-5,5,n),
'd':np.random.uniform(-5,5,n),
'e':np.random.uniform(-5,5,n),
'f':np.random.uniform(-5,5,n)
})
The results with lazy operations
shape: (3, 2)
┌──────────┬───────────┐
│ user ┆ timeit_ms │
│ --- ┆ --- │
│ str ┆ i64 │
╞══════════╪═══════════╡
│ jqurious ┆ 103 │ # coalesce
│ Dean ┆ 257 │ # fold
│ Hericks ┆ 1110 │ # concat_list
└──────────┴───────────┘
{v:i for i,v in enumerate(df.columns)}
to cover all the columns – Dorkas