I was wondering if there is some way to specify a custom aggregation function for spark dataframes over multiple columns.
I have a table like this of the type (name, item, price):
john | tomato | 1.99
john | carrot | 0.45
bill | apple | 0.99
john | banana | 1.29
bill | taco | 2.59
to:
I would like to aggregate the item and it's cost for each person into a list like this:
john | (tomato, 1.99), (carrot, 0.45), (banana, 1.29)
bill | (apple, 0.99), (taco, 2.59)
Is this possible in dataframes? I recently learned about collect_list
but it appears to only work for one column.
col(...)
instead of$"..."
for a reason -- I findcol(...)
works with less work inside of things likeclass
definitions. – Changeover