Print 'std err' value from statsmodels OLS results
Asked Answered
S

5

47

(Sorry to ask but http://statsmodels.sourceforge.net/ is currently down and I can't access the docs)

I'm doing a linear regression using statsmodels, basically:

import statsmodels.api as sm
model = sm.OLS(y,x)
results = model.fit()

I know that I can print out the full set of results with:

print results.summary()

which outputs something like:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.952
Model:                            OLS   Adj. R-squared:                  0.951
Method:                 Least Squares   F-statistic:                     972.9
Date:                Mon, 20 Jul 2015   Prob (F-statistic):           5.55e-34
Time:                        15:35:22   Log-Likelihood:                -78.843
No. Observations:                  50   AIC:                             159.7
Df Residuals:                      49   BIC:                             161.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             1.0250      0.033     31.191      0.000         0.959     1.091
==============================================================================
Omnibus:                       16.396   Durbin-Watson:                   2.166
Prob(Omnibus):                  0.000   Jarque-Bera (JB):                3.480
Skew:                          -0.082   Prob(JB):                        0.175
Kurtosis:                       1.718   Cond. No.                         1.00
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

I need a way to print out only the values of coef and std err.

I can access coef with:

print results.params

but I've found no way to print out std err.

How can I do this?

Shoshone answered 20/7, 2015 at 18:38 Comment(2)
For now temporary, but most likely permanent replacement for the documentation on sourceforge is here statsmodels.github.io/dev/generated/…Pachalic
Didn't know that, thank you!Shoshone
S
85

Applying the answer given here I used dir() to print all the attributes of the results object.

After that I searched for the one that contained the std err value and it turned out to be:

print results.bse

(Not sure what the b stands for in bse, but I guess the se stands for "standard error")

Shoshone answered 20/7, 2015 at 19:58 Comment(4)
The b is a historical artifact, when params where called b as in linear model y = X b + u, and should be properly called params_sePachalic
Thanks for the explanation @user333700!Shoshone
@Pachalic Do you mean results.params_se. It doesn't seem to work.Sidwell
It’s still called ”bse” as in the answer. The name was never changed.Pachalic
H
8

results.bse provides standard errors for the coefficients, identical to those listed in results.summary().

The standard error of the regression is obtained using results.scale**.5.

Also identical to np.sqrt(np.sum(results.resid**2)/results.df_resid), where results is your fitted model.

Horbal answered 12/9, 2021 at 19:58 Comment(0)
M
1

statistically standard error of estimate is always equal to square root of mean square error of residual. It can be obtained from results using the formula np.sqrt(results.mse_resid)

Multinuclear answered 12/8, 2021 at 5:49 Comment(1)
The question is for standard errors of parameter estimates, not for residual standard error.Pachalic
H
0

The following function can be used to get an overview of the regression analysis result. The parameter ols_model is the regression model generated by statsmodels.formula.api. The output is a pandas data frame saving the regression coefficient, standard errors, p values, number of observations, AIC, and adjusted rsquared. The standard errors are saved in brackets. ***, **, and * represent 0.001, 0.01, 0.1 significance level:

def output_regres_result(ols_model, variable_list: list):
    """
    Create a pandas dataframe saving the regression analysis result
    :param ols_model: a linear model containing the regression result.
    type: statsmodels.regression.linear_model.RegressionResultsWrapper
    :param variable_list: a list of interested variable names
    :return: a pandas dataframe saving the regression coefficient, pvalues, standard errors, aic,
    number of observations, adjusted r squared
    """
    coef_dict = ols_model.params.to_dict()  # coefficient dictionary
    pval_dict = ols_model.pvalues.to_dict()  # pvalues dictionary
    std_error_dict = ols_model.bse.to_dict()  # standard error dictionary
    num_observs = np.int(ols_model.nobs) # number of observations
    aic_val = round(ols_model.aic, 2) # aic value
    adj_rsqured = round(ols_model.rsquared_adj, 3) # adjusted rsqured
    info_index = ['Num', 'AIC', 'Adjusted R2']
    index_list = variable_list + info_index

    for variable in variable_list:
        assert variable in coef_dict, 'Something wrong with variable name!'

    coef_vals = []

    for variable in variable_list:
        std_val = std_error_dict[variable]
        coef_val = coef_dict[variable]
        p_val = pval_dict[variable]
        if p_val <= 0.01:
            coef_vals.append('{}***({})'.format(round(coef_val, 4), round(std_val, 3)))
        elif 0.01 < p_val <= 0.05:
            coef_vals.append('{}**({})'.format(round(coef_val, 4), round(std_val, 3)))
        elif 0.05 < p_val <= 0.1:
            coef_vals.append('{}*({})'.format(round(coef_val, 4), round(std_val, 3)))
        else:
            coef_vals.append('{}({})'.format(round(coef_val, 4), round(std_val, 3)))

    coef_vals.extend([num_observs, aic_val, adj_rsqured])

    result_data = pd.DataFrame()
    result_data['coef'] = coef_vals
    result_data_reindex = result_data.set_index(pd.Index(index_list))

    return result_data_reindex
Halfcock answered 5/3, 2021 at 1:16 Comment(1)
@ Bright Chang, it throws error: AttributeError: 'numpy.ndarray' object has no attribute 'to_dict'Kori
S
0

I like Topchi's method but an identical result can be pulled with slightly less code. This is for residual standard error, rather than standard errors of parameter estimates which others have already shared in the thread :)

np.sqrt(results.scale)
Syverson answered 24/10, 2022 at 11:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.