is there an equivalent to Stata's egen function? [duplicate]
Asked Answered
J

1

8

Stata has a very nice command, egen, which makes it easy to compute statistics over group of observation. For instance, it is possible to compute the max, the mean and the min for each group and add them as a variable in the detailed data set. The Stata command is one line of code :

by group : egen max = max(x)

I've never found the same command in R. summarise in the dplyr package makes it easy to compute statistics for each group but then I have to run a loop to associate the statistic to each observation :

library("dplyr")
N  <- 1000
tf  <- data.frame(group = sample(1:100, size = N, replace = TRUE), x = rnorm(N))
table(tf$group)
mtf  <- summarise(group_by(tbl_df(tf), group), max = max(x))
tf$max  <- NA
for (i in 1:nrow(mtf)) {
  tf$max[tf$group == mtf$group[i]]  <- mtf$max[i]
}

Does any one has a better solution ?

Josie answered 11/6, 2014 at 11:13 Comment(3)
There is a number of alternatives. Your question shows a lack of research (you didn't even study the dplyr package vignette). -1Holliehollifield
I have no bias against egen (I wrote some of the functions) but even from a Stata viewpoint it is just a handy collection of stuff for creating variables. There's no central idea that maps onto anything that would be a central idea in R. Even the convenience of producing summary statistics by group is not in fact part of the definition or role of egen but just something possible with some of its components. I won't speak for R but I suspect some of its packages are also a bit miscellaneous.Lifeless
I agree with you but it still really useful.Josie
C
13

Here are a few approaches:

dplyr

library(dplyr)

tf %>% group_by(group) %>% mutate(max = max(x))

ave

This uses only the base of R:

transform(tf, max = ave(x, group, FUN = max))

data.table

library(data.table)

dt <- data.table(tf)
dt[, max:=max(x), by=group]
Capitular answered 11/6, 2014 at 11:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.