Apply function to all columns of a Polars-DataFrame
Asked Answered
D

1

11

I know how to apply a function to all columns present in a Pandas-DataFrame. However, I have not figured out yet how to achieve this when using a Polars-DataFrame.

I checked the section from the Polars User Guide devoted to this topic, but I have not find the answer. Here I attach a code snippet with my unsuccessful attempts.

import numpy as np
import polars as pl
import seaborn as sns

# Loading toy dataset as Pandas DataFrame using Seaborn
df_pd = sns.load_dataset('iris')

# Converting Pandas DataFrame to Polars DataFrame
df_pl = pl.DataFrame(df_pd)

# Dropping the non-numeric column...
df_pd = df_pd.drop(columns='species')                     # ... using Pandas
df_pl = df_pl.drop('species')                             # ... using Polars

# Applying function to the whole DataFrame...
df_pd_new = df_pd.apply(np.log2)                          # ... using Pandas
# df_pl_new = df_pl.apply(np.log2)                        # ... using Polars?

# Applying lambda function to the whole DataFrame...
df_pd_new = df_pd.apply(lambda c: np.log2(c))             # ... using Pandas
# df_pl_new = df_pl.apply(lambda c: np.log2(c))           # ... using Polars?

Thanks in advance for your help and your time.

Decode answered 4/6, 2021 at 9:33 Comment(2)
Can you change the tag to python-polars?Deniable
Of course. I just added python-polars tag to the original question tags.Decode
D
20

You can use the expression syntax to select all columns with pl.all() and then map_batches the numpy np.log2(..) function over the columns.

df.select(
    pl.all().map_batches(np.log2)
)

Note that we choose map_batches here as map_elements would call the function upon each value.

map_elements = pl.Series(np.log2(value) for value in pl.Series([1, 2, 3]))

But np.log2 can be called once with multiple values, which would be faster.

map_batches = np.log2(pl.Series([1, 2, 3]))

See the User guide for more.

  • map_elements: Call a function separately on each value in the Series.
  • map_batches: Always passes the full Series to the function.

Numpy

Polars expressions also support numpy universal functions.

That means you can pass a polars expression to a numpy ufunc:

df.select(
    np.log2(pl.all())
)
Deniable answered 11/6, 2021 at 9:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.