How to use sample weights in GAM (mgcv) on survey data for Logit regression?

About

Asked 26/5, 2019 at 13:16 Answered 7/7, 2019 at 6:8

I'm interesting in performing a GAM regression on data from a national wide survey which presents sample weights. I read with interest this post. I selected my vars of interest generating a DF:

nhanesAnalysis <- nhanesDemo %>%
                    select(fpl,
                           age,
                           gender,
                           persWeight,
                           psu,
                           strata)

Than, for what I understood, I generated a weighted DF with the following code:

library(survey)    
nhanesDesign <- svydesign(    id      = ~psu,
                              strata  = ~strata,
                              weights = ~persWeight,
                              nest    = TRUE,
                              data    = nhanesAnalysis)

Let's say that I would select only subjects with age≥30:

ageDesign <- subset(nhanesDesign, age >= 30)

Now, I would fit a GAM model (fpl ~ s(age) + gender) with mgcv package. Is it possible to do so with the weights argument or using svydesign object ageDesign ?

EDIT

I was wondering if is it correct to extrapolate computed weights from the an svyglm object and use it for weights argument in GAM.

Oren answered 26/5, 2019 at 13:16 Comment(4)

Does this do what you want ? gam(formula = fpl ~ s(age) + gender, weights = nhanesAnalysis$persWeight,data = nhanesAnalysis) – Lipcombe 3/6, 2019 at 16:5

Thank you @SantiagoI.Hurtado. This is what I would like to know. I'm no7 sure that weights argument is enough – Oren 3/6, 2019 at 16:32

maybe this can help you: stats.stackexchange.com/questions/273296/… – Lipcombe 3/6, 2019 at 16:57

@SantiagoI.Hurtado thay you for helping but unfortunately did not go deep into the problem of sample weights in complex survey design – Oren 4/6, 2019 at 12:13

This is more difficult than it looks. There are two issues

You want to get the right amount of smoothing
You want valid standard errors.

Just giving the sampling weights to mgcv::gam() won't do either of these: gam() treats the weights as frequency weights and so will think it has a lot more data than it actually has. You will get undersmoothing and underestimated standard errors because of the weights, and you will also likely get underestimated standard errors because of the cluster sampling.

The simple work-around is to use regression splines (splines package) instead. These aren't quite as good as the penalised splines used by mgcv, but the difference usually isn't a big deal, and they work straightforwardly with svyglm. You do need to choose how many degrees of freedom to assign.

library(splines)
svglm(fpl ~ ns(age,4) + gender, design = nhanesDesign)

Rumelia answered 7/7, 2019 at 6:8 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags