Why would R-Squared decrease when I add an exogenous variable in OLS using python statsmodels
Asked Answered
V

1

6

If I understand the OLS model correctly, this should never be the case?

trades['const']=1
Y = trades['ret']+trades['comms']
#X = trades[['potential', 'pVal', 'startVal', 'const']]
X = trades[['potential', 'pVal', 'startVal']]

from statsmodels.regression.linear_model import OLS
ols=OLS(Y, X)
res=ols.fit()
res.summary()

If I turn the const on, I get a rsquared of 0.22 and with it off, I get 0.43. How is that even possible?

Vicereine answered 16/4, 2015 at 2:45 Comment(0)
V
9

see the answer here Statsmodels: Calculate fitted values and R squared

Rsquared follows a different definition depending on whether there is a constant in the model or not.

Rsquared in a linear model with a constant is the standard definition that uses a comparison with a mean only model as reference. Total sum of squares is demeaned.

Rsquared in a linear model without a constant compares with a model that has no regressors at all, or the effect of the constant is zero. In this case the R squared calculation uses a total sum of squares that does not demean.

Since the definition changes if we add or drop a constant, the R squared can go either way. The actual explained sum of squares will always increase if we add additional explanatory variables, or stay unchanged if the new variable doesn't contribute anything,

Vercingetorix answered 16/4, 2015 at 4:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.