Is there a reason why there are two different commands to generate a new variable?
Is there a simple way to remember when to use gen
and when to use egen
?
Is there a reason why there are two different commands to generate a new variable?
Is there a simple way to remember when to use gen
and when to use egen
?
They both create a new variable, but work with different sets of functions. You will typically use gen
when you have simple transformations of other variables in your dataset like
gen newvar = oldvar1^2 * oldvar2
In my workflow, egen
usually appears when I need functions that work across all observations, like in
egen max_var = max(var)
or more complex instructions
egen newvar = rowmax(oldvar1 oldvar2)
to calculate the maximum for each observation between oldvar1
and oldvar2
. I don't think there is a clear logic for separating the two commands.
egen
is used whenever gen
isn't :) –
Monosymmetric generate
is a fast internal command. egen
is being parsed by Stata, and you can write extensions to it using Stata ado-code. You cannot do that with generate
. This is a rather painful legacy of the 80s as compared to R where you can define a function inline and forget it after it was used. –
Cornet egen
is an "extension" to the egen
command because it reaches beyond simple computations (var1 + var2
, log(var1)
, etc.) to add descriptive stats, standardizations and more. Some of the stuff that can be done with plyr
and apply
in R is therefore done with statsby
and egen
in Stata. I use it to gently hack confidence intervals and scatterplots. –
Automate gen
generate
may be abbreviated by gen
or even g
and can be used with the following mathematical operators and functions:
+
addition-
subtraction*
multiplication /
division ^
powerA large number of functions is available. Here are some examples:
abs(x)
absolute value of xexp(x)
antilog of xint(x) or trunc(x)
truncation to integer valueln(x), log(x)
natural logarithm of xround(x)
rounds to the nearest integer of xround(x,y)
x rounded in units of y (i.e., round(x,.1) rounds to one decimal place)sqrt(x)
square root of xruniform()
returns uniformly distributed numbers between 0 and nearly 1rnormal()
returns numbers that follow a standard normal distributionrnormal(x,y)
returns numbers that follow a normal distribution with a mean of x and a s.d. of yegen
A number of more complex possibilities have been implemented in the egen
command like in the following examples:
egen nkids = anycount(pers1 pers2 pers3 pers4 pers5), value(1)
egen v323r = rank(v323)
egen myindex = rowmean(var15 var17 var18 var20 var23)
egen nmiss = rowmiss(x1-x10 var15-var23)
egen nmiss = rowtotal(x1-x10 var15-var23)
egen incomst = std(income)
bysort v3: egen mincome = mean(income)
Detailed usage explanations can be found at this link.
© 2022 - 2024 — McMap. All rights reserved.
generate
. If this is something more complicated, e.g. needs to be done on groups of observations (which are not very easily addressed in Stata), you would need to look for an appropriateegen
function. – Cornet