What's the difference between gen and egen in Stata 12?
Asked Answered
M

2

23

Is there a reason why there are two different commands to generate a new variable?

Is there a simple way to remember when to use gen and when to use egen?

Monosymmetric answered 20/10, 2012 at 23:30 Comment(0)
S
22

They both create a new variable, but work with different sets of functions. You will typically use gen when you have simple transformations of other variables in your dataset like

gen newvar = oldvar1^2 * oldvar2

In my workflow, egen usually appears when I need functions that work across all observations, like in

egen max_var = max(var)

or more complex instructions

egen newvar = rowmax(oldvar1 oldvar2)

to calculate the maximum for each observation between oldvar1 and oldvar2. I don't think there is a clear logic for separating the two commands.

Scrawly answered 20/10, 2012 at 23:52 Comment(6)
There's a pretty clear logic, actually. If the task can be done with the existing mathematical functions, you use generate. If this is something more complicated, e.g. needs to be done on groups of observations (which are not very easily addressed in Stata), you would need to look for an appropriate egen function.Cornet
Agree. But I still don't see the logic of having two separate commands.Scrawly
I think Stata's logic is very clear: egen is used whenever gen isn't :)Monosymmetric
@griverorz, there are differences in implementation. generate is a fast internal command. egen is being parsed by Stata, and you can write extensions to it using Stata ado-code. You cannot do that with generate. This is a rather painful legacy of the 80s as compared to R where you can define a function inline and forget it after it was used.Cornet
StasK is correct. Techincally, egen is an "extension" to the egen command because it reaches beyond simple computations (var1 + var2, log(var1), etc.) to add descriptive stats, standardizations and more. Some of the stuff that can be done with plyr and apply in R is therefore done with statsby and egen in Stata. I use it to gently hack confidence intervals and scatterplots.Automate
So is it about backwards compatibility? egen functions not breaking or being ambiguous with gen functions?Carnivore
O
4

gen

generate may be abbreviated by gen or even g and can be used with the following mathematical operators and functions:

  • + addition
  • - subtraction
  • * multiplication
  • / division
  • ^ power

A large number of functions is available. Here are some examples:

  • abs(x) absolute value of x
  • exp(x) antilog of x
  • int(x) or trunc(x) truncation to integer value
  • ln(x), log(x) natural logarithm of x
  • round(x) rounds to the nearest integer of x
  • round(x,y) x rounded in units of y (i.e., round(x,.1) rounds to one decimal place)
  • sqrt(x)square root of x
  • runiform() returns uniformly distributed numbers between 0 and nearly 1
  • rnormal() returns numbers that follow a standard normal distribution
  • rnormal(x,y) returns numbers that follow a normal distribution with a mean of x and a s.d. of y

egen

A number of more complex possibilities have been implemented in the egen command like in the following examples:

  • egen nkids = anycount(pers1 pers2 pers3 pers4 pers5), value(1)
  • egen v323r = rank(v323)
  • egen myindex = rowmean(var15 var17 var18 var20 var23)
  • egen nmiss = rowmiss(x1-x10 var15-var23)
  • egen nmiss = rowtotal(x1-x10 var15-var23)
  • egen incomst = std(income)
  • bysort v3: egen mincome = mean(income)

Detailed usage explanations can be found at this link.

Overtrick answered 17/6, 2018 at 22:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.