I'd like to split a sample according to a specific variable, creating 4 sub-samples each one related to a quartile of the variable's distribution. The aim is to demonstrate that the presence of different levels of this variable influences the outcome of a regression, making it significant or not.
How to split a sample according to a certain variable in Stata?
The easiest way to do this is to use the egen
command to cut your variable into four equally-spaced intervals.
Example:
. sysuse auto, clear
(1978 Automobile Data)
. sum price, detail
Price
-------------------------------------------------------------
Percentiles Smallest
1% 3291 3291
5% 3748 3299
10% 3895 3667 Obs 74
25% 4195 3748 Sum of Wgt. 74
50% 5006.5 Mean 6165.257
Largest Std. Dev. 2949.496
75% 6342 13466
90% 11385 13594 Variance 8699526
95% 13466 14500 Skewness 1.653434
99% 15906 15906 Kurtosis 4.819188
. egen price_cut = cut(price), group(4)
. table price_cut, contents(n price min price max price)
----------------------------------------------
price_cut | N(price) min(price) max(price)
----------+-----------------------------------
0 | 18 3,291 4,187
1 | 19 4,195 4,934
2 | 18 5,079 6,303
3 | 19 6,342 15,906
----------------------------------------------
I hope this helps you.
This is the easiest way you can go about it:
xtile xx=yourvariable, nq(4)
I hope this helps.
© 2022 - 2024 — McMap. All rights reserved.
xtile
does what is designed to do;egen
'scut()
allows other ways of subdivision too. – Crucifix