aggregrated_table = df_input.groupBy('city', 'income_bracket') \
.agg(
count('suburb').alias('suburb'),
sum('population').alias('population'),
sum('gross_income').alias('gross_income'),
sum('no_households').alias('no_households'))
Would like to group by city and income bracket but within each city certain suburbs have different income brackets. How do I group by the most frequently occurring income bracket per city?
for example:
city1 suburb1 income_bracket_10
city1 suburb1 income_bracket_10
city1 suburb2 income_bracket_10
city1 suburb3 income_bracket_11
city1 suburb4 income_bracket_10
Would be grouped by income_bracket_10