Weighted linear regression in R with lm() and svyglm(). Same model, different results
Asked Answered
H

1

6

I want to do a linear regression applying survey weights in R studio. I have seen that it is possible to do this with the lm() function, which enables me to specify the weights I want to use. However, it is also possible to do this with the svyglm() function, which does the regression with variables in a survey design object which has been weighted by the desired variable.

In theory, I see no reason for the results of these two regression models to be different, and the beta estimates are the same. However, the standard errors in each model are different, leading to different p-values and therefore to different levels of significance.

Which model is the most appropriate one? Any help would be greatly appreciated.

Here is the R code:

dat <- read.csv("https://raw.githubusercontent.com/LucasTremlett/questions/master/questiondata.csv")
model.weighted1 <-  lm(DV~IV1+IV2+IV3, data=dat, weights = weight)
summary(model.weighted1)
dat.weighted<- svydesign(ids = ~1, data = dat, weights = dat$weight)
model.weighted2<- svyglm(DV~IV1+IV2+IV3, design=dat.weighted)
summary(model.weighted2)
Holzer answered 27/9, 2020 at 15:8 Comment(3)
Weighting is tricky; the mathematical/statistical definition of the weights differs across contexts. Which method is appropriate probably depends on what the weights actually mean in the context of your problem. notstatschat.rbind.io/2020/08/04/weights-in-statistics is a very good (IMO) explanation of the differences.Cashmere
I see... Thanks for the helpful link. I think based on the article I want to use "sampling weights", as this is data from the European Voter Election Study (which is a survey). Does this mean the second model is more appropiate, as it comes from the "survey" package? The documentation does not really specify which of the three kind of weights it is, but it does provide the means for weighted and unweighted samples (europeanelectionstudies.net/wp-content/uploads/2019/11/…). From the article it seems that the weights option in lm() calculates precision weights.Holzer
Yes, it's highly likely that if you're working in a survey-data context that you want to use svyglmCashmere
R
8

Mostly to confirm what is in the comments already:

  • lm and svyglm will always give the same point estimates, but will typically give different standard errors. In the terminology I use here, and which @BenBolker already links (Thanks!), lm assumes precision weights and svyglm assumes sampling weights
  • For that particular survey data set, you have sampling weights and want svyglm
  • From the description of the survey you'd expect also to have a stratum variable, but it looks as though they don't supply it. If they did, it would go into svydesign and would be used to reduce the standard errors in svyglm
Redcoat answered 27/9, 2020 at 21:59 Comment(1)
answers are better than comments anyway.Cashmere

© 2022 - 2024 — McMap. All rights reserved.