How to conduct linear hypothesis test on regression coefficients with a clustered covariance matrix?
Asked Answered
I

1

3

I am interested in calculating estimates and standard errors for linear combinations of coefficients after a linear regression in R. For example, suppose I have the regression and test:

data(mtcars)
library(multcomp)
lm1 <- lm(mpg ~ cyl + hp, data = mtcars)
summary(glht(lm1, linfct = 'cyl + hp = 0'))

This will estimate the value of the sum of the coefficients on cyl and hp, and provide the standard error based on the covariance matrix produced by lm.

But, suppose I want to cluster my standard errors, on a third variable:

data(mtcars)
library(multcomp)
library(lmtest)
library(multiwayvcov)
lm1 <- lm(mpg ~ cyl + hp, data = mtcars)
vcv <- cluster.vcov(lm1, cluster = mtcars$am)
ct1 <- coeftest(lm1,vcov. = vcv)

ct1 contains the SEs for my clustering by am. However, if I try to use the ct1 object in glht, you get an error saying

Error in modelparm.default(model, ...) : no ‘coef’ method for ‘model’ found!

Any advice on how to do the linear hypothesis with the clustered variance covariance matrix?

Thanks!

Investigate answered 19/10, 2016 at 0:20 Comment(3)
check out this. Lots of info on how to incorporate clustered SEs into linear hypothesis testing. Looks basically the same as what @Zheyuan offers.Pandemonium
summary(glht(ct1, linfct = 'cyl + hp = 0'))Investigate
It seems both this method and glht function in R only works for categorical variables, right? What if I have a continuous variables? For example, I am testing the interaction term year:agegroup in a logistic regression model, the year is a continuous variable (coded as 1, 2, 3, ..., 14), while agegroup is categorical. How to test the significance of odds ratio comparing different agegroups when controlled for year by using this approach? Thanks.Wise
P
5

glht(ct1, linfct = 'cyl + hp = 0') won't work, because ct1 is not a glht object and can not be coerced to such via as.glht. I don't know whether there is a package or an existing function to do this, but this is not a difficult job to work out ourselves. The following small function does it:

LinearCombTest <- function (lmObject, vars, .vcov = NULL) {
  ## if `.vcov` missing, use the one returned by `lm`
  if (is.null(.vcov)) .vcov <- vcov(lmObject)
  ## estimated coefficients
  beta <- coef(lmObject)
  ## sum of `vars`
  sumvars <- sum(beta[vars])
  ## get standard errors for sum of `vars`
  se <- sum(.vcov[vars, vars]) ^ 0.5
  ## perform t-test on `sumvars`
  tscore <- sumvars / se
  pvalue <- 2 * pt(abs(tscore), lmObject$df.residual, lower.tail = FALSE)
  ## return a matrix
  matrix(c(sumvars, se, tscore, pvalue), nrow = 1L,
         dimnames = list(paste0(paste0(vars, collapse = " + "), " = 0"),
                         c("Estimate", "Std. Error", "t value", "Pr(>|t|)")))
  }

Let's have a test:

data(mtcars)
lm1 <- lm(mpg ~ cyl + hp, data = mtcars)
library(multiwayvcov)
vcv <- cluster.vcov(lm1, cluster = mtcars$am)

If we leave .vcov unspecified in LinearCombTest, it is as same as multcomp::glht:

LinearCombTest(lm1, c("cyl","hp"))

#              Estimate Std. Error   t value     Pr(>|t|)
#cyl + hp = 0 -2.283815  0.5634632 -4.053175 0.0003462092

library(multcomp)
summary(glht(lm1, linfct = 'cyl + hp = 0'))

#Linear Hypotheses:
#              Estimate Std. Error t value Pr(>|t|)    
#cyl + hp == 0  -2.2838     0.5635  -4.053 0.000346 ***

If we provide a covariance, it does what you want:

LinearCombTest(lm1, c("cyl","hp"), vcv)

#              Estimate Std. Error  t value    Pr(>|t|)
#cyl + hp = 0 -2.283815  0.7594086 -3.00736 0.005399071

Remark

LinearCombTest is upgraded at Get p-value for group mean difference without refitting linear model with a new reference level, where we can test any combination with combination coefficients alpha:

alpha[1] * vars[1] + alpha[2] * vars[2] + ... + alpha[k] * vars[k]

rather than just the sum

vars[1] + vars[2] + ... + vars[k]
Predatory answered 19/10, 2016 at 1:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.