How to extract the regression coefficient from statsmodels.api?
Asked Answered
T

5

48
 result = sm.OLS(gold_lookback, silver_lookback ).fit()

After I get the result, how can I get the coefficient and the constant?

In other words, if y = ax + c how to get the values a and c?

Totem answered 20/11, 2017 at 9:1 Comment(1)
For accessing coefficients individually: #29166101Polyvalent
C
78

You can use the params property of a fitted model to get the coefficients.

For example, the following code:

import statsmodels.api as sm
import numpy as np
np.random.seed(1)
X = sm.add_constant(np.arange(100))
y = np.dot(X, [1,2]) + np.random.normal(size=100)
result = sm.OLS(y, X).fit()
print(result.params)

will print you a numpy array [ 0.89516052 2.00334187] - estimates of intercept and slope respectively.

If you want more information, you can use the object result.summary() that contains 3 detailed tables with model description.

Calumet answered 20/11, 2017 at 9:19 Comment(2)
the first one is constant and the second one is the coefficient?Totem
Exactly! That's how sm.add_constant() works: it takes a matrix (or a vector, as in my case```, and adds the leftmost column of ones to it. The coefficient corresponding to this column is the intercept.Calumet
D
11

Cribbing from this answer Converting statsmodels summary object to Pandas Dataframe, it seems that the result.summary() is a set of tables, which you can export as html and then use Pandas to convert to a dataframe, which will allow you to directly index the values you want.

So, for your case (putting the answer from the above link into one line):

df = pd.read_html(result.summary().tables[1].as_html(),header=0,index_col=0)[0]

And then

a=df['coef'].values[1]
c=df['coef'].values[0]
Donahue answered 10/10, 2019 at 9:15 Comment(1)
Great! However, that does not work with summary2() whose details are more detailed!Overstrung
O
2

Adding up details on @IdiotTom answer.

You may use:

head = pd.read_html(res.summary2().as_html())[0]
body = pd.read_html(res.summary2().as_html())[1]

Not as nice, but the info is there.

Overstrung answered 11/3, 2020 at 13:44 Comment(0)
M
1

The coefficients are saved as a dictionary in the result.params data frame, that's a pandas Series. In it, the constant term is stored as Intercept, as others pointed. The variable terms are stored with their variable names. So, if your model is y ~ x, the regression coefficients will be available as result.params['Intercept'] (that's b) and result.params['x'] (that's a) for the equation y = a*x + b.

Miscount answered 17/6, 2022 at 12:20 Comment(0)
P
0

If the input to the API is pandas objects (i.e. a pd.DataFrame for the data, or pd.Series for x and for y), then when you access .params it will be a pd.Series, so each coefficient is easily accessible by its name.

For example:

import statsmodels.api as sm 
# sm.__version__ is '0.13.1'


df = pd.DataFrame({'x': [0,  1,2,3,4],
                   'y': [0.1, 0.2, 0.5, 0.5, 0.8]
                  })

sm.OLS.from_formula(formula='y~x-1', data=df).fit().params

Outputs the following pd.Series:

x    0.196667
dtype: float64

Allowing for an intercept term (by changing the formula from y~x-1 to y~x) changes the output to include the intercept under the name Intercept:

Intercept    0.08
x            0.17
dtype: float64
Poisonous answered 16/6, 2022 at 10:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.