I have a SparkR DataFrame
and I want to get the mode (most often) value
for each unique name
. How can I do this? There doesn't seem to be a built-in mode
function. Either a SparkR or PySpark solution will do.
# Create DF
df <- data.frame(name = c("Thomas", "Thomas", "Thomas", "Bill", "Bill", "Bill"),
value = c(5, 5, 4, 3, 3, 7))
DF <- createDataFrame(df)
name | value
-----------------
Thomas | 5
Thomas | 5
Thomas | 4
Bill | 3
Bill | 3
Bill | 9
# What I want to get
name | mode(value)
-----------------
Thomas | 5
Bill | 3
Window.partitionBy('name').orderBy(desc('count'))
does? I'm also having trouble converting this code to SparkR, though thewindowPartitionBy
commands exists there. – Evocative