Reviewing linear regressions via statsmodels OLS fit I see you have to use add_constant to add a constant '1' to all your points in the independent variable(s) before fitting. However my only understanding of intercepts in this context would be the value of y for our line when our x equals 0, so I'm not clear what purpose always just injecting a '1' here serves. What is this constant actually telling the OLS fit?
statsmodels add_constant for OLS intercept, what is this actually doing?
Asked Answered
It doesn't add a constant to your values, it adds a constant term to the linear equation it is fitting. In the single-predictor case, it's the difference between fitting an a line y = mx
to your data vs fitting y = mx + b
.
so all the constant is doing is indicating there is a "b" in the equation? –
Parrott
@TimLindsey: In essence, yes. It tells the model to fit a value for
b
as well as coefficients for your predictors. I've never really understood why statsmodels requires you to add this explicitly, since as described here you pretty much always want to do it unless you have a specific justification for not doing so. –
Paraclete statsmodels
' sm.add_constant
is the same as the parameter fit_intercept
in scikit-learn
's LinearRegression()
.
If you don't do sm.add_constant
or if you do LinearRegression(fit_intercept=False)
, both algorithms assume that b = 0 in y = mx + b. Therefore, they will fit the model using b = 0 instead of calculating what b is supposed to be based on your data.
© 2022 - 2024 — McMap. All rights reserved.