How to get the regression intercept using Statsmodels.api
Asked Answered
C

2

24

I am trying calculate a regression output using python library but I am unable to get the intercept value when I use the library:

import statsmodels.api as sm

It prints all the regression analysis except the intercept.

but when I use:

from pandas.stats.api import ols

My code for pandas:

Regression = ols(y= Sorted_Data3['net_realization_rate'],x = Sorted_Data3[['Cohort_2','Cohort_3']])
print Regression  

I get the the intercept with a warning that this librabry will be deprecated in the future so I am trying to use Statsmodels.

the warning that I get while using pandas.stats.api:

Warning (from warnings module): File "C:\Python27\lib\idlelib\run.py", line 325 exec code in self.locals FutureWarning: The pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages like statsmodels, see some examples here: http://statsmodels.sourceforge.net/stable/regression.html

My code for Statsmodels:

import pandas as pd
import numpy as np
from pandas.stats.api import ols
import statsmodels.api as sm

Data1 = pd.read_csv('C:\Shank\Regression.csv')  #Importing CSV
print Data1

running some cleaning code

sm_model = sm.OLS(Sorted_Data3['net_realization_rate'],Sorted_Data3[['Cohort_2','Cohort_3']])
results = sm_model.fit()
print '\n'
print results.summary()

I even tried statsmodels.formula.api: as:

sm_model = sm.OLS(formula ="net_realization_rate ~ Cohort_2 + Cohort_3", data = Sorted_Data3)
results = sm_model.fit()
print '\n'
print result.params
print '\n'
print results.summary()

but I get the error:

TypeError: init() takes at least 2 arguments (1 given)

Final output: 1st is from pandas 2nd is from Stats.... I want the intercept vaule as the one from pandas from stats also: enter image description here

Chane answered 8/8, 2016 at 18:49 Comment(2)
you imported ols but didn't use it. try: sm_model = ols(...Higgledypiggledy
Yes I used it ....the ols gives me the result but also a warning that the future use of that library..pandas.stats.api will be deprecated in the future so I am trying to use other library...statsmodels.apiChane
R
30

So, statsmodels has a add_constant method that you need to use to explicitly add intercept values. IMHO, this is better than the R alternative where the intercept is added by default.

In your case, you need to do this:

import statsmodels.api as sm
endog = Sorted_Data3['net_realization_rate']
exog = sm.add_constant(Sorted_Data3[['Cohort_2','Cohort_3']])

# Fit and summarize OLS model
mod = sm.OLS(endog, exog)
results = mod.fit()
print results.summary()

Note that you can add a constant before your array, or after it by passing True (default) or False to the prepend kwag in sm.add_constant


Or, not recommended, but you can use Numpy to explicitly add a constant column like so:

exog = np.concatenate((np.repeat(1, len(Sorted_Data3))[:, None], 
                       Sorted_Data3[['Cohort_2','Cohort_3']].values),
                       axis = 1)
Rabbitfish answered 8/8, 2016 at 21:9 Comment(1)
It seems like this website stole/uses your post: newbedev.com/…Novosibirsk
F
13

You can also do something like this:

df['intercept'] = 1

Here you are explicitly creating a column for the intercept.

Then you can just use the sm.OLS method like so:

lm = sm.OLS(df['y_column'], df[['intercept', 'x_column']])
results = lm.fit()
results.summary()
Fruge answered 18/4, 2019 at 18:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.