Regression with Heteroskedasticity Corrected Standard Errors
Asked Answered
W

3

30

I would like to find the R implementation that most closely resembles Stata output for fitting a least squares regression function with Heteroskedastic Corrected Standard Errors.

Specifically, I would like the corrected standard errors to be in the "summary" and not have to do additional calculations for my initial round of hypothesis testing. I am looking for a solution that is as "clean" as what Eviews and Stata provide.

So far, using the "lmtest" package, the best I can come up with is:

model <- lm(...)
coeftest(model, vcov = hccm)

This gives me the output that I want, but it does not seem to be using "coeftest" for its stated purpose. I would also have to use the summary with the incorrect standard errors to read off the R^2 and F stat, etc. I feel that there should exist a "one line" solution to this problem given how dynamic R is.

Waligore answered 8/12, 2010 at 8:24 Comment(1)
You should note that you need package car also for hccm(). It took me a few minutes to work out where that was coming from.Lilienthal
L
39

I think you are on the right track with coeftest in package lmtest. Take a look at the sandwich package which includes this functionality and is designed to work hand in hand with the lmtest package you have already found.

> # generate linear regression relationship
> # with Homoskedastic variances
> x <- sin(1:100)
> y <- 1 + x + rnorm(100)
> ## model fit and HC3 covariance
> fm <- lm(y ~ x)
> vcovHC(fm)
            (Intercept)           x
(Intercept) 0.010809366 0.001209603
x           0.001209603 0.018353076
> coeftest(fm, vcov. = vcovHC)

t test of coefficients:

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  1.01973    0.10397  9.8081 3.159e-16 ***
x            0.93992    0.13547  6.9381 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

To get the F test, look at function waldtest():

> waldtest(fm, vcov = vcovHC)
Wald test

Model 1: y ~ x
Model 2: y ~ 1
  Res.Df Df      F    Pr(>F)    
1     98                        
2     99 -1 48.137 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You could always cook up a simple function to combine these two for you if you wanted the one-liner...

There are lots of examples in the Econometric Computing with HC and HAC Covariance Matrix Estimators vignette that comes with the sandwich package of linking lmtest and sandwich to do what you want.

Edit: A one-liner could be as simple as:

mySummary <- function(model, VCOV) {
    print(coeftest(model, vcov. = VCOV))
    print(waldtest(model, vcov = VCOV))
}

Which we can use like this (on the examples from above):

> mySummary(fm, vcovHC)

t test of coefficients:

            Estimate Std. Error t value  Pr(>|t|)    
(Intercept)  1.01973    0.10397  9.8081 3.159e-16 ***
x            0.93992    0.13547  6.9381 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Wald test

Model 1: y ~ x
Model 2: y ~ 1
  Res.Df Df      F    Pr(>F)    
1     98                        
2     99 -1 48.137 4.313e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Lilienthal answered 8/12, 2010 at 9:27 Comment(4)
@Jerome: If you are happy with an answer, upvote or accept it (that's how things work here on SO). OK, you pay me what you paid for Stata and I'll write the one-liner for you ;-) Seriously, R is developed by volunteers and individuals. These people develop code they need to use and do it how they feel it should be done. lmtest is about providing targeted tests of linear models. The authors probably felt it better to provide the raw tools to do these things and didn't have need of the summary-sugar you want. I'll show you how easy it is to write a one-line wrapper in an edit to my Answer.Lilienthal
any idea what STATA does here, cause I was trying to reproduce results somebody else calculated with STATA's robust option.Immaterial
Ah, I can answer my own question I here (the one I raised on the previous comment): The equivalent to what STATA does would be to use typ="HC1" with vcovHC.Immaterial
A brief explanation why lmtest does not provide mysummary(..., robust = TRUE). First, "robust" is frequently misinterpreted - it's just a very specific type of robustness. Second, the inevitable question would be: What is the "residual standard error" and "R-squared" if you modify the covariance matrix estimate? Answer: Not so clear. You explicitly account for heteroscedasticity so that there is not one residual variance. Similarly, the decomposition of the sums of squares does not work "as usual" for the R-squared. Hence users have to ask specifically for a coeftest() or waldtest().Uncouth
F
10

I found an R function that does exactly what you are looking for. It gives you robust standard errors without having to do additional calculations. You run summary() on an lm.object and if you set the parameter robust=T it gives you back Stata-like heteroscedasticity consistent standard errors.

summary(lm.object, robust=T)

You can find the function on https://economictheoryblog.com/2016/08/08/robust-standard-errors-in-r/

Ferdinana answered 23/8, 2016 at 18:1 Comment(1)
This does not work without loading an external script provided by the author which changes the summary function. Without loading that script, R simply ignores the robust variable and outputs the standard heteroscedasticity inconsistent standard errors.Transfiguration
J
2

There is now a one-line solution using lm_robust from the estimatr package, which you can install from CRAN install.packages(estimatr).

> library(estimatr)
> lmro <- lm_robust(mpg ~ hp, data = mtcars, se_type = "stata")
> summary(lmro)

Call:
lm_robust(formula = mpg ~ hp, data = mtcars, se_type = "stata")

Standard error type:  HC1 

Coefficients:
            Estimate Std. Error  Pr(>|t|) CI Lower CI Upper DF
(Intercept) 30.09886    2.07661 4.348e-15 25.85785 34.33987 30
hp          -0.06823    0.01356 2.132e-05 -0.09592 -0.04053 30

Multiple R-squared:  0.6024 ,   Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

You can also get tidy output:

> tidy(lmro)
         term    estimate std.error      p.value    ci.lower
1 (Intercept) 30.09886054 2.0766149 4.347723e-15 25.85784704
2          hp -0.06822828 0.0135604 2.131785e-05 -0.09592231
     ci.upper df outcome
1 34.33987404 30     mpg
2 -0.04053425 30     mpg

The "stata" standard errors default to "HC1" standard errors, which are the default rob standard errors in Stata. You can also get "classical", "HC0", "HC1", "HC2", "HC3" and various clustered standard errors as well (including those that match Stata).

Jungjungfrau answered 18/4, 2018 at 17:37 Comment(1)
It's generally advisable to avoid the Stata default and to go with HC3.Pus

© 2022 - 2024 — McMap. All rights reserved.