Stata has a very nice command, egen
, which makes it easy to compute statistics over group of observation. For instance, it is possible to compute the max, the mean and the min for each group and add them as a variable in the detailed data set. The Stata command is one line of code :
by group : egen max = max(x)
I've never found the same command in R. summarise
in the dplyr
package makes it easy to compute statistics for each group but then I have to run a loop to associate the statistic to each observation :
library("dplyr")
N <- 1000
tf <- data.frame(group = sample(1:100, size = N, replace = TRUE), x = rnorm(N))
table(tf$group)
mtf <- summarise(group_by(tbl_df(tf), group), max = max(x))
tf$max <- NA
for (i in 1:nrow(mtf)) {
tf$max[tf$group == mtf$group[i]] <- mtf$max[i]
}
Does any one has a better solution ?
egen
(I wrote some of the functions) but even from a Stata viewpoint it is just a handy collection of stuff for creating variables. There's no central idea that maps onto anything that would be a central idea in R. Even the convenience of producing summary statistics by group is not in fact part of the definition or role ofegen
but just something possible with some of its components. I won't speak for R but I suspect some of its packages are also a bit miscellaneous. – Lifeless