Statsmodels.formula.api OLS does not show statistical values of intercept
Asked Answered
B

2

2

I am running the following source code:

import statsmodels.formula.api as sm

# Add one column of ones for the intercept term
X = np.append(arr= np.ones((50, 1)).astype(int), values=X, axis=1)

regressor_OLS = sm.OLS(endog=y, exog=X).fit()
print(regressor_OLS.summary())

where

X is an 50x5 (before adding the intercept term) numpy array which looks like this:

[[0 1 165349.20 136897.80 471784.10]
 [0 0 162597.70 151377.59 443898.53]...]

and y is a a 50x1 numpy array with float values for the dependent variable.

The first two columns are for a dummy variable with three different values. The rest of the columns are three different indepedent variables.

Although, it is said that the statsmodels.formula.api.OLS adds automatically an intercept term (see @stellacia's answer here: OLS using statsmodel.formula.api versus statsmodel.api) its summary does not show the statistical values of the intercept term as it evident below in my case:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Profit   R-squared:                       0.988
Model:                            OLS   Adj. R-squared:                  0.986
Method:                 Least Squares   F-statistic:                     727.1
Date:                Sun, 01 Jul 2018   Prob (F-statistic):           7.87e-42
Time:                        21:40:23   Log-Likelihood:                -545.15
No. Observations:                  50   AIC:                             1100.
Df Residuals:                      45   BIC:                             1110.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1          3464.4536   4905.406      0.706      0.484   -6415.541    1.33e+04
x2          5067.8937   4668.238      1.086      0.283   -4334.419    1.45e+04
x3             0.7182      0.066     10.916      0.000       0.586       0.851
x4             0.3113      0.035      8.885      0.000       0.241       0.382
x5             0.0786      0.023      3.429      0.001       0.032       0.125
==============================================================================
Omnibus:                        1.355   Durbin-Watson:                   1.288
Prob(Omnibus):                  0.508   Jarque-Bera (JB):                1.241
Skew:                          -0.237   Prob(JB):                        0.538
Kurtosis:                       2.391   Cond. No.                     8.28e+05
==============================================================================

For this reason, I added to my source code the line:

X = np.append(arr= np.ones((50, 1)).astype(int), values=X, axis=1)

as you can see at the beginning of my post and the statistical values of the intercept/constant are shown as below:

 OLS Regression Results                            
==============================================================================
Dep. Variable:                 Profit   R-squared:                       0.951
Model:                            OLS   Adj. R-squared:                  0.945
Method:                 Least Squares   F-statistic:                     169.9
Date:                Sun, 01 Jul 2018   Prob (F-statistic):           1.34e-27
Time:                        20:25:21   Log-Likelihood:                -525.38
No. Observations:                  50   AIC:                             1063.
Df Residuals:                      44   BIC:                             1074.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       5.013e+04   6884.820      7.281      0.000    3.62e+04     6.4e+04
x1           198.7888   3371.007      0.059      0.953   -6595.030    6992.607
x2           -41.8870   3256.039     -0.013      0.990   -6604.003    6520.229
x3             0.8060      0.046     17.369      0.000       0.712       0.900
x4            -0.0270      0.052     -0.517      0.608      -0.132       0.078
x5             0.0270      0.017      1.574      0.123      -0.008       0.062
==============================================================================
Omnibus:                       14.782   Durbin-Watson:                   1.283
Prob(Omnibus):                  0.001   Jarque-Bera (JB):               21.266
Skew:                          -0.948   Prob(JB):                     2.41e-05
Kurtosis:                       5.572   Cond. No.                     1.45e+06
==============================================================================

Why the statistical values of the intercept are not showing when I do not add my myself an intercept term even though it is said that statsmodels.formula.api.OLS is adding this automatically?

Bookout answered 1/7, 2018 at 20:52 Comment(1)
Related with some additional explanation: OLS using statsmodel.formula.api versus statsmodel.apiChapa
H
5

"No constant is added by the model unless you are using formulas." Therefore try something like below example. Variable names should be defined according to your data set.

Use,

regressor_OLS  = smf.ols(formula='Y_variable ~ X_variable', data=df).fit()

instead of,

regressor_OLS = sm.OLS(endog=y, exog=X).fit()
Hedges answered 2/7, 2018 at 6:28 Comment(1)
Thanks for the answer. You are right and actually this is said also by @Brad here(#30650757) but unfortunately this answer is not ticked as the correct one so I did not pay attention to it initially.Bookout
M
0

Can use this X = sm.add_constant(X)

Micromillimeter answered 16/12, 2020 at 6:51 Comment(1)
while this will ensure an intercept, the question is more about why the intercept is not added by default. Please read the accepted answer, the rationale is that you need to use statsmodels.formula.apiTruesdale

© 2022 - 2024 — McMap. All rights reserved.