In pandas, one can do:
import pandas as pd
d = {"foo":[1,2,3, None], "bar":[4,None, None, 6]}
df_pandas = pd.DataFrame.from_dict(d)
dict(df_pandas.isnull().sum())
[out]:
{'foo': 1, 'bar': 2}
In polars it's possible to do the same by looping through the columns:
import polars as pl
d = {"foo":[1,2,3, None], "bar":[4,None, None, 6]}
df_polars = pl.from_dict(d)
{col:df_polars[col].is_null().sum() for col in df_polars.columns}
Looping through the columns in polars is particularly painful when using LazyFrame
, then the .collect()
has to be done in chunks to do the aggregation.
Is there a way to find no. of nulls in every column in a polars dataframe without looping through each columns?
df_polars.collect().null_count()
? How does that work with LazyFrame? – Crinitedf_polars.collect()
is not the best thing to do for large dataset. – Crinite