Standardize a variable by group in Stata
Asked Answered
L

2

6

I need to generate a new variable that is a standardized value of another variable but by a group (SAT scores by year). I calculated it using the following code:

egen mean_sat = mean(sat), by(year)
egen sd_sat = sd(sat), by(year)
gen std_dat = (sat - mean_sat) / sd_sat

Is there another more direct way to do that? I tried the following with no success...

. by year, sort : egen float std_SAT = std(sat)
egen ... std() may not be combined with by
r(190);

. egen std_SAT = std(sat), by(year)
egen ... std() may not be combined with by
Laminous answered 14/1, 2015 at 17:48 Comment(0)
L
4

At present, the officially written egen function std() does not support operations by. I can't identify a statistical or computational reason for that, but it is well documented. (Why you need luck to get past a documented limitation I don't understand.)

In principle, any user could write their own egen function to support what you want to be implemented in a one-line call. In practice, no one seems bothered enough to write it given the easy work-around that you have used. In practice, these things get written when someone gets irritated at the need for typing three lines of code repeatedly. A much more positive reason why the code you cite is useful is that statistically you should usually want to keep track of means and standard deviations any way.

EDIT 20 July 2020

Update to Stata 16.1

update 30jun2020

  1. egen has the following updates:

    c. egen function std() now allows by varlist:. When used with by varlist:, values are standardized within each group defined by varlist. The option specifying a value for the standard deviation has been renamed sd() (the old option name std() continues to work as well).

Lidialidice answered 14/1, 2015 at 18:28 Comment(1)
Thank you, a "Side effect" of my question is that I wanted someone to take a look at the code I have to make sure that it will yield correct results :)Laminous
I
0

Stata doesn't have it but you can very easily do it yourself. And here is how:

1- The hint is that you can't "standardize" by group, but you can take mean and standard deviations by group. So, (a) take the mean by group, (b) take standard deviation by group, and finally (c) standardized_variable= (the_var-mean_of_the_var)/std_of_the_var

2- Example: so, let's standardize variable "sales" by "company". Here's how the code works:

egen company_group = group(company)
sort company_group

by company_group: egen sales_mean= mean(sales)
by company_group: egen sales_sd  = sd(sales)
by company_group: gen  sales_std = (sales-sales_mean)/sales_sd
Illjudged answered 28/12, 2015 at 23:47 Comment(1)
This is just the code the OP put in their question! The difference is largely cosmetic, whether the by operation is performed using an option or using a prefix command.Lidialidice

© 2022 - 2024 — McMap. All rights reserved.