Using if qualifier with egen in Stata

Asked 6/3, 2012 at 4:16 Answered 20/2, 2013 at 10:52

I'm using Stata, and I'm trying to compute the average price of firms' rivals in a market. I have data that looks like:

Market    Firm   Price
----------------------
1         1      100
1         2      150
1         3      125
2         1      50
2         2      100
2         3      75
3         1      100
3         2      200
3         3      200

And I'm trying to compute the average price of each firm's rivals, so I want to generate a new field that is the average values of the other firms in a market. It would look like:

Market    Firm   Price    AvRivalPrice
------------------------------------
1         1      100      137.2
1         2      150      112.5
1         3      125      125
2         1      50       87.5
2         2      100      62.5
2         3      75       75
3         1      100      200
3         2      200      150
3         3      200      150

To do the average by group, I could use the egen command:

egen AvPrice = mean(price), by(Market)

But that wouldn't exclude the firm's own price in the average, and to the best of my knowledge, using the if qualifier would only change the observations it operated on, not the groups it averaged over. Is there a simple way to do this, or do I need to create loops and generate each average manually?

Spaceless answered 6/3, 2012 at 4:16 Comment(3)

Can u give a small example of ur calculation... It wud make much clear – Adage 6/3, 2012 at 4:19

I did (the data table is the example of what I). – Spaceless 6/3, 2012 at 4:31

The if being talked about is not the if command, but the if qualifier. I've edited accordingly. – Plagiarize 28/10, 2014 at 23:24

This is a way that avoids explicit loops, though it takes several lines of code:

by Market: egen Total = total(Price)
replace Total = Total - Price
by Market: gen AvRivalPrice = Total / (_N-1)
drop Total

Aurangzeb answered 6/3, 2012 at 11:29 Comment(1)

This will produce an error message if the data are not sorted by Market. It has to be bysort, at least in the first case. – Lawrence 7/3, 2012 at 2:48

This is an old thread still of interest, so materials and techniques overlooked first time round still apply.

The more general technique is to work with totals. At its simplest, total of others = total of all - this value. In a egen framework that is going to look like

egen total = total(price), by(market) 
egen n = total(!missing(price)), by(market) 
gen avprice = (total - cond(missing(price), 0, price)) / cond(missing(price), n, n - 1)

The total() function of egen ignores missing values in its argument. If there are missing values, we don't want to include them in the count, but we can use !missing() which yields 1 if not missing and 0 if missing. egen's count() is another way to do this.

Code given earlier gives the wrong answer if missings are present as they are included in the count _N.

Even if a value is missing, the average of the other values still makes sense.

If no value is missing, the last line above simplifies to

gen avprice = (total - price) / (n - 1)

So far, this possibly looks like no more than a small variant on previous code, but it does extend easily to using weights. Presumably we want a weighted average of others' prices given some weight. We can exploit the fact that total() works on expressions, which can be more complicated than just variable names. Indeed the code above did that already, but it is often overlooked.

egen wttotal = total(weight * price), by(market) 
egen sumwt = total(weight), by(market) 
gen avprice = (wttotal - price * weight) / (sumwt - weight)

As before, if price or weight is ever missing, you need more complicated code, or just to ensure that you exclude such observations from the calculations.

See also the Stata FAQ

How do I create variables summarizing for each individual properties of the other members of a group?

http://www.stata.com/support/faqs/data-management/creating-variables-recording-properties/

for a wider-ranging discussion.

(If the numbers get big, work with doubles.)

EDIT 2 March 2018 That was a newer post in an old thread, which in turn needs updating. rangestat (SSC) can be used here and gives one-line solutions. Not surprisingly, the option excludeself was explicitly added for these kinds of problem. But while the solution for means is easy using an identity

mean for others = (total - value for self) / (count - 1)

many other summary measures don't yield to a similar, simple trick and in that sense rangestat includes much more general coding.

clear
input Market    Firm   Price
1         1      100
1         2      150
1         3      125
2         1      50
2         2      100
2         3      75
3         1      100
3         2      200
3         3      200
end 

rangestat (mean) Price, interval(Firm . .) by(Market) excludeself 

list, sepby(Market) 

     +----------------------------------+
     | Market   Firm   Price   Price_~n |
     |----------------------------------|
  1. |      1      1     100      137.5 |
  2. |      1      2     150      112.5 |
  3. |      1      3     125        125 |
     |----------------------------------|
  4. |      2      1      50       87.5 |
  5. |      2      2     100       62.5 |
  6. |      2      3      75         75 |
     |----------------------------------|
  7. |      3      1     100        200 |
  8. |      3      2     200        150 |
  9. |      3      3     200        150 |
     +----------------------------------+

Plagiarize answered 20/2, 2013 at 10:52 Comment(0)

This is a way that avoids explicit loops, though it takes several lines of code:

by Market: egen Total = total(Price)
replace Total = Total - Price
by Market: gen AvRivalPrice = Total / (_N-1)
drop Total

Aurangzeb answered 6/3, 2012 at 11:29 Comment(1)

This will produce an error message if the data are not sorted by Market. It has to be bysort, at least in the first case. – Lawrence 7/3, 2012 at 2:48

Here's a shorter solution with fewer lines that kind of combines your original thought and @onestop's solution:

      egen AvPrice = mean(price), by(Market)
      bysort Market: replace AvPrice = (AvPrice*_N - price)/(_N-1)

This is all good for a census of firms. If you have a sample of the firms, and you need to apply the weights, I am not sure what a good solution would be. We can brainstorm it if needed.

Lawrence answered 7/3, 2012 at 2:46 Comment(1)

Thanks, I was able to handle the modifications to convert it for a sample (took a few extra lines), but this got me thinking in the right direction. – Spaceless 7/3, 2012 at 16:51

Recommended topics

Hot tags