statsmodels add_constant for OLS intercept, what is this actually doing?
Asked Answered
P

2

12

Reviewing linear regressions via statsmodels OLS fit I see you have to use add_constant to add a constant '1' to all your points in the independent variable(s) before fitting. However my only understanding of intercepts in this context would be the value of y for our line when our x equals 0, so I'm not clear what purpose always just injecting a '1' here serves. What is this constant actually telling the OLS fit?

Parrott answered 31/12, 2016 at 2:8 Comment(0)
P
15

It doesn't add a constant to your values, it adds a constant term to the linear equation it is fitting. In the single-predictor case, it's the difference between fitting an a line y = mx to your data vs fitting y = mx + b.

Paraclete answered 31/12, 2016 at 2:10 Comment(2)
so all the constant is doing is indicating there is a "b" in the equation?Parrott
@TimLindsey: In essence, yes. It tells the model to fit a value for b as well as coefficients for your predictors. I've never really understood why statsmodels requires you to add this explicitly, since as described here you pretty much always want to do it unless you have a specific justification for not doing so.Paraclete
I
9

statsmodels' sm.add_constant is the same as the parameter fit_intercept in scikit-learn's LinearRegression().

If you don't do sm.add_constant or if you do LinearRegression(fit_intercept=False), both algorithms assume that b = 0 in y = mx + b. Therefore, they will fit the model using b = 0 instead of calculating what b is supposed to be based on your data.

Ingravescent answered 13/4, 2017 at 16:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.