How to retrieve model estimates from statsmodels?
Asked Answered
P

2

19

From a dataset like this:

import pandas as pd
import numpy as np
import statsmodels.api as sm

# A dataframe with two variables
np.random.seed(123)
rows = 12
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, 2)), columns=['y', 'x']) 
df = df.set_index(rng)

enter image description here

...and a linear regression model like this:

x = sm.add_constant(df['x'])
model = sm.OLS(df['y'], x).fit()

... you can easily retrieve some model coefficients this way:

print(model.params)

enter image description here

But I just can't find out how to retrieve all other parameters from the model summary:

print(str(model.summary()))

enter image description here

As stated in the question, I'm particularly interested in R-squared.

From the post How to extract a particular value from the OLS-summary in Pandas? I learned that you could just use print(model.r2) to do the same thing there. But that does not seem to work for statsmodels.

Any suggestions?

Piteous answered 30/1, 2018 at 13:27 Comment(0)
G
39

You can get R-squared like:

Code:

model.rsquared

Test Code:

import pandas as pd
import numpy as np
import statsmodels.api as sm

# A dataframe with two variables
np.random.seed(123)
rows = 12
rng = pd.date_range('1/1/2017', periods=rows, freq='D')
df = pd.DataFrame(np.random.randint(100,150,size=(rows, 2)), columns=['y', 'x'])
df = df.set_index(rng)

x = sm.add_constant(df['x'])
model = sm.OLS(df['y'], x).fit()

print(model.params)
print(model.rsquared)
print(str(model.summary()))

Results:

const    176.636417
x         -0.357185
dtype: float64

0.338332793094

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.338
Model:                            OLS   Adj. R-squared:                  0.272
Method:                 Least Squares   F-statistic:                     5.113
Date:                Tue, 30 Jan 2018   Prob (F-statistic):             0.0473
Time:                        05:36:04   Log-Likelihood:                -41.442
No. Observations:                  12   AIC:                             86.88
Df Residuals:                      10   BIC:                             87.85
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        176.6364     20.546      8.597      0.000     130.858     222.415
x             -0.3572      0.158     -2.261      0.047      -0.709      -0.005
==============================================================================
Omnibus:                        1.934   Durbin-Watson:                   1.182
Prob(Omnibus):                  0.380   Jarque-Bera (JB):                1.010
Skew:                          -0.331   Prob(JB):                        0.603
Kurtosis:                       1.742   Cond. No.                     1.10e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.1e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Finding All Attribute Names:

With a small bit of code:

for attr in dir(model):
    if not attr.startswith('_'):
        print(attr)

You can see all of the attributes on an object:

HC0_se
HC1_se
HC2_se
HC3_se
aic
bic
bse
centered_tss
compare_f_test
compare_lm_test
compare_lr_test
condition_number
conf_int
conf_int_el
cov_HC0
cov_HC1
cov_HC2
cov_HC3
cov_kwds
cov_params
cov_type
df_model
df_resid
eigenvals
el_test
ess
f_pvalue
f_test
fittedvalues
fvalue
get_influence
get_prediction
get_robustcov_results
initialize
k_constant
llf
load
model
mse_model
mse_resid
mse_total
nobs
normalized_cov_params
outlier_test
params
predict
pvalues
remove_data
resid
resid_pearson
rsquared
rsquared_adj
save
scale
ssr
summary
summary2
t_test
tvalues
uncentered_tss
use_t
wald_test
wald_test_terms
wresid
Garett answered 30/1, 2018 at 13:38 Comment(4)
Thank you! That extra info about how to find all attribute names was great!Piteous
Please correct me if I am wrong, but this is not a list of attributes. For example, conf_int is not an attribute, it's a method. Calling print(model.conf_int) throws a TypeError.Miscegenation
@AnthonyNash Attributes are names that each point to a python object. Some of those objects maybe callable, and thus would generally be referred to as methods. Showing the attribute list in this way is just a quick hack to possibly make it easier to find what you were looking for on an object. But what each of those attributes refers to is not guaranteed to be anything in particular.Garett
awesome answer, especially extra info about how to find all attribute namesGaige
M
1

You can use attibutes such as

  • model.f_pvalue to get p-value of F-statistic
  • model.rsquared to get rsquared value of the model etc.

Refer the documentation at https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.RegressionResults.html

Sample usage: Jupyter screenshot of attributes

Magnetic answered 10/7, 2020 at 9:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.