I would like to select a row with maximum value in each group with dplyr.
Firstly I generate some random data to show my question
set.seed(1)
df <- expand.grid(list(A = 1:5, B = 1:5, C = 1:5))
df$value <- runif(nrow(df))
In plyr, I could use a custom function to select this row.
library(plyr)
ddply(df, .(A, B), function(x) x[which.max(x$value),])
In dplyr, I am using this code to get the maximum value, but not the rows with maximum value (Column C in this case).
library(dplyr)
df %>% group_by(A, B) %>%
summarise(max = max(value))
How could I achieve this? Thanks for any suggestion.
sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.2 plyr_1.8.1
loaded via a namespace (and not attached):
[1] assertthat_0.1.0.99 parallel_3.1.0 Rcpp_0.11.1
[4] tools_3.1.0
filter
approach would return all maximum values (rows) per group while the OP's ddply approach withwhich.max
would only return one maximum (the first) per group. To replicate that behavior, another option is to useslice(which.max(value))
in dplyr. – Finedraw