I am analysing some data with PySpark DataFrames. Suppose I have a DataFrame df
that I am aggregating:
(df.groupBy("group")
.agg({"money":"sum"})
.show(100)
)
This will give me:
group SUM(money#2L)
A 137461285853
B 172185566943
C 271179590646
The aggregation works just fine but I dislike the new column name SUM(money#2L)
. Is there a way to rename this column into something human readable from the .agg
method? Maybe something more similar to what one would do in dplyr
:
df %>% group_by(group) %>% summarise(sum_money = sum(money))
alias
part but don't see it taking affect, pay attention to your parentheses.alias('string')
exists inside theagg
, otherwise you're aliasing the entire DataFrame not only the column. – Storer