Pearson correlation coefficient in R's survey package

Asked 22/12, 2015 at 15:10 Answered 8/12, 2016 at 3:6

Sorry if this is really obvious, but I can't see how to do a simple Pearson correlation between two variables in the survey package. My data has strata so it would be the equivalent to finding r for api00 and api99 in apistrat.

library(survey)
data(api)

dstrat <- svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)

I'm sure there must be a simple way of doing it using svyvar or svyglm or something but I can't see it?

Brumfield answered 22/12, 2015 at 15:10 Comment(3)

cor(apistrat$api00, apistrat$api99) [1] 0.9741931 The default is Pearson correlation. – Footrest 22/12, 2015 at 17:16

Thanks but I don't think that would take account of the strata or the weights. That's why I was hoping for something in the survey package. – Brumfield 22/12, 2015 at 18:10

wtd.cor in the weights package might do the job, but I'm not sure. It also wouldn't help with more complex sampling designs. Something in the survey package would certainly be the most convenient option. – Brumfield 22/12, 2015 at 18:14

You can use svyvar to estimate the variance-covariance matrix, and then scale it to the correlation:

library(survey)
data(api)

dstrat <- svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
v <- svyvar(~api00+api99, dstrat)

as.matrix(v)
cov2cor(as.matrix(v))

This works for any number of correlations and any design.

Mussorgsky answered 8/12, 2016 at 3:6 Comment(2)

Thanks Thomas, this is much simpler – Brumfield 9/12, 2016 at 10:24

How could this method be extended for obtaining confidence intervals of the correlations? @ThomasLumley – Laminitis 8/8, 2021 at 19:10

I've been thinking around the problem a bit and I'm starting to think the best way forward might be to just scale both of the variables first, presumably using svymean and svyvar.

dstrat2 <- transform(dstrat, 
                     z_api99 = (api99 - svymean(~api99,    dstrat))/sqrt(svyvar(~api99, dstrat)), 
                     z_api00 = (api00 - svymean(~api00, dstrat))/sqrt(svyvar(~api00, dstrat))) 

svyglm(z_api99 ~ z_api00, dstrat2)$coefficients

This gives 9.759047e-01, which is the same result as using:

library(weights)
wtd.cor(apistrat$api99, apistrat$api00, weight = apistrat$pw)

It has the advantage that it could be used with pretty much any survey design type. It also provides a way of getting the standardised beta coefficients if there are more variables. It isn't quite what I was after with the original question, but it might be the best way if there isn't a specific option.

If anyone else can confirm if this works or not, or if there is a better way, then I'd be very grateful for any further comments.

Brumfield answered 30/12, 2015 at 21:22 Comment(0)

Recommended topics

Hot tags