I found an answer (now deleted) to this question, and I'm curious why it doesn't work.
Question is: return the row corresponding to the minimum value, by group.
So for example, given the dataset:
df <- data.frame(State = c(rep('AK',4),rep('RI',4)),
Company = LETTERS[1:8],
Employees = c(82L, 104L, 37L, 24L, 19L, 118L, 88L, 42L))
...the correct answer is:
State Company Employees
1: AK D 24
2: RI E 19
as can be obtained, for example, by
library(data.table); setDT(df)[ , .SD[which.min(Employees)], by = State]
My question is why this plyr::ddply
command doesn't work:
library(plyr)
ddply(df, .(State), summarise, Employees=min(Employees),
Company=Company[which.min(Employees)])
# returns:
# State Employees Company
# 1 AK 24 A
# 2 RI 19 E
In other words, why is which.min(Employees)
returning 1 for each group, instead of c(4,1)
? Note that outside of ddply
, this works:
summarise(df, minEmp = min(Employees), whichMin = which.min(Employees))
# minEmp whichMin
# 1 19 5
I don't use plyr
much, but I'd like to know the right way to do it, if there's a reasonable one.
plyr
would be... – Hatcher