I want to count how many of records are true in a column from a grouped Spark dataframe but I don't know how to do that in python. For example, I have a data with a region
, salary
and IsUnemployed
column with IsUnemployed
as a Boolean. I want to see how many unemployed people in each region. I know we can do a filter
and then groupby
but I want to generate two aggregation at the same time as below
from pyspark.sql import functions as F
data.groupby("Region").agg(F.avg("Salary"), F.count("IsUnemployed"))