What to use to do multiple correlation?
Asked Answered
S

1

8

I am trying to use python to compute multiple linear regression and multiple correlation between a response array and a set of arrays of predictors. I saw the very simple example to compute multiple linear regression, which is easy. But how to compute multiple correlation with statsmodels? or with anything else, as an alternative. I guess i could use rpy and R, but i'd prefer to stay in python if possible.

edit [clarification]: Considering a situation like the one described here: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704-EP713_MultivariableMethods/ I would like to compute also multiple correlation coefficients for the predictors, in addition to the regression coefficients and the other regression parameters

Schwinn answered 19/11, 2012 at 10:56 Comment(5)
should i perhaps use the GLM (generalized linear model)?Schwinn
maybe you could explain in a little bit more detail what exactly you are trying to do.Pieria
so, imagine a situation like the one described here: sph.bu.edu/otlt/lamorte/EP713/Web_Pages/EP713_Regression/… I would like to compute also multiple correlation coefficients for the predictors, in addition to the regression coefficients and other regression parametersSchwinn
@Paul It would be better if you would edit your question, not giving the information as a comment, I think.Biisk
Maybe update the link? Not a big deal since I know what I'm after but would be nice.Amharic
A
18

You could certainly do this with statsmodels and pandas. Something like this might get you started

import pandas
import statsmodels.api as sm
from statsmodels.formula.api import ols

data = pandas.DataFrame([["A", 4, 0, 1, 27], 
                         ["B", 7, 1, 1, 29], 
                         ["C", 6, 1, 0, 23], 
                         ["D", 2, 0, 0, 20], 
                         ["etc.", 3, 0, 1, 21]], 
                         columns=["ID", "score", "male", "age20", "BMI"])
print data.corr()

model = ols("BMI ~ score + male + age20", data=data).fit()
print model.params
print model.summary()

Have a look at the documentation:

http://statsmodels.sourceforge.net/devel/

http://pandas.pydata.org/

Edit: I'm not familiar with the terminology multiple correlation coefficient, but I believe this is just square root of the R-squared of a multiple regression model no?

print model.rsquared**.5
print model.rsquared_adj**.5

Is this what you're after?

Auer answered 19/11, 2012 at 14:44 Comment(3)
+1, is the formula api available in 0.4 or are you using a development version here?Biisk
It was added in 0.5. A 0.5 prerelease is available on pypi with the formula framework available. The final release should be forthcoming before the end of the year hopefully.Auer
I am getting an absurdly high correlation coefficient using this method despite no strong pairwise correlations. Anyone have suggestions as to what might be going on?Amerind

© 2022 - 2024 — McMap. All rights reserved.