statsmodels: printing summary of more than one regression models together
Asked Answered
S

4

5

In the Python library Statsmodels, you can print out the regression results with print(results.summary()), how can I print out the summary of more than one regressions in one table, for better comparison?

A linear regression, code taken from statsmodels documentation:

nsample = 100
x = np.linspace(0, 10, 100)
X = np.column_stack((x, x**2))
beta = np.array([0.1, 10])
e = np.random.normal(size=nsample)
y = np.dot(X, beta) + e

model = sm.OLS(y, X)
results_noconstant = model.fit()

Then I add a constant to the model and run the regression again:

beta = np.array([1, 0.1, 10])
X = sm.add_constant(X)
y = np.dot(X, beta) + e 

model = sm.OLS(y, X)
results_withconstant = model.fit()

I'd like to see the summaries of results_noconstant and results_withconstant printed out in one table. This should be a very useful function, but I didn't find any instruction about this in the statsmodels documentation.

EDIT: The regression table I had in mind would be something like this, I wonder whether there is ready-made functionality to do this.

Sauerbraten answered 28/1, 2016 at 2:10 Comment(1)
Also see #23576828Gummosis
W
3

I am sure there are number of ways to do that. Depends on what you can / want use to achieve that.

The starting point most likely will be the same:

statsmodels 'linear_model'.fit() returns RegressionResults class, which has summary2() method returning subclass with a few convenice methods.

One of which, for example, .tables returns pandas.DataFrame.

Here is how you could use this:

import pandas as pd 
results = {'Noconst':results_noconstant.summary2(), 
           'withcon':results_withconstant.summary2()}
df = pd.DataFrame({'Model':[], 'Param':[], 'Value':[]})
for mod in results.keys():
    for col in results[mod].tables[0].columns:
        if col % 2 == 0: 
            df = df.append(pd.DataFrame({'Model': [mod]*results[mod].tables[0][col].size,
                                         'Param':results[mod].tables[0][col].values, 
                                         'Value':results[mod].tables[0][col+1].values}))

print df

Which yields:

     Model                Param             Value
0  Noconst               Model:               OLS
1  Noconst  Dependent Variable:                 y
2  Noconst                Date:  2016-01-29 00:33
3  Noconst    No. Observations:               100
4  Noconst            Df Model:                 2
5  Noconst        Df Residuals:                98
6  Noconst           R-squared:             1.000
0  Noconst      Adj. R-squared:             1.000
1  Noconst                 AIC:          296.0102
2  Noconst                 BIC:          301.2205
3  Noconst      Log-Likelihood:           -146.01
4  Noconst         F-statistic:         9.182e+06
5  Noconst  Prob (F-statistic):         4.33e-259
6  Noconst               Scale:            1.1079
0  withcon               Model:               OLS
1  withcon  Dependent Variable:                 y
2  withcon                Date:  2016-01-29 00:33
3  withcon    No. Observations:               100
4  withcon            Df Model:                 2
5  withcon        Df Residuals:                97
6  withcon           R-squared:             1.000
0  withcon      Adj. R-squared:             1.000
1  withcon                 AIC:          297.8065
2  withcon                 BIC:          305.6220
3  withcon      Log-Likelihood:           -145.90
4  withcon         F-statistic:         4.071e+06
5  withcon  Prob (F-statistic):         1.55e-239
6  withcon               Scale:            1.1170

What you can do with this is only limited by your ability to use pandas - powerful Python data analysis toolkit.

Wombat answered 28/1, 2016 at 21:52 Comment(1)
Thanks, I thought statsmodels might have some ready-made functionality to make some regression tables, that's why I asked the question. The regression table I had in mind would be something like this link , but I guess I can twig the dataframe some how to make it similiar to this one. I'll add this link to the main question in case there is some ready way to make that kind of table.Sauerbraten
L
9

There is summary_col, which AFAIR is still missing from the documentation.

I have not really tried it out much, but I found a related example from an issue to remove some of the "nuisance" parameters.

"""
mailing list, and issue https://github.com/statsmodels/statsmodels/pull/1638
"""

import pandas as pd
import numpy as np
import string
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col

df = pd.DataFrame({'A' : list(string.ascii_uppercase)*10,
                   'B' : list(string.ascii_lowercase)*10,
                   'C' : np.random.randn(260),
                   'D' : np.random.normal(size=260),
                   'E' : np.random.random_integers(0,10,260)})

m1 = smf.ols('E ~ D',data=df).fit()
m2 = smf.ols('E ~ D + C',data=df).fit()
m3 = smf.ols('E ~ D + C + B',data=df).fit()
m4 = smf.ols('E ~ D + C + B + A',data=df).fit()

print(summary_col([m1,m2,m3,m4]))

There is still room for improvement.

Launceston answered 29/1, 2016 at 3:11 Comment(1)
Yeah, it's still far from pretty.. But glad to know someone is working on it!Sauerbraten
G
6

There is now a Python version of the well known stargazer R package, which does exactly this.

See also this related question: https://economics.stackexchange.com/q/11774/24531

Gummosis answered 12/10, 2019 at 14:52 Comment(1)
Good stuff. Worth noting that this currently only works for OLSOller
W
3

I am sure there are number of ways to do that. Depends on what you can / want use to achieve that.

The starting point most likely will be the same:

statsmodels 'linear_model'.fit() returns RegressionResults class, which has summary2() method returning subclass with a few convenice methods.

One of which, for example, .tables returns pandas.DataFrame.

Here is how you could use this:

import pandas as pd 
results = {'Noconst':results_noconstant.summary2(), 
           'withcon':results_withconstant.summary2()}
df = pd.DataFrame({'Model':[], 'Param':[], 'Value':[]})
for mod in results.keys():
    for col in results[mod].tables[0].columns:
        if col % 2 == 0: 
            df = df.append(pd.DataFrame({'Model': [mod]*results[mod].tables[0][col].size,
                                         'Param':results[mod].tables[0][col].values, 
                                         'Value':results[mod].tables[0][col+1].values}))

print df

Which yields:

     Model                Param             Value
0  Noconst               Model:               OLS
1  Noconst  Dependent Variable:                 y
2  Noconst                Date:  2016-01-29 00:33
3  Noconst    No. Observations:               100
4  Noconst            Df Model:                 2
5  Noconst        Df Residuals:                98
6  Noconst           R-squared:             1.000
0  Noconst      Adj. R-squared:             1.000
1  Noconst                 AIC:          296.0102
2  Noconst                 BIC:          301.2205
3  Noconst      Log-Likelihood:           -146.01
4  Noconst         F-statistic:         9.182e+06
5  Noconst  Prob (F-statistic):         4.33e-259
6  Noconst               Scale:            1.1079
0  withcon               Model:               OLS
1  withcon  Dependent Variable:                 y
2  withcon                Date:  2016-01-29 00:33
3  withcon    No. Observations:               100
4  withcon            Df Model:                 2
5  withcon        Df Residuals:                97
6  withcon           R-squared:             1.000
0  withcon      Adj. R-squared:             1.000
1  withcon                 AIC:          297.8065
2  withcon                 BIC:          305.6220
3  withcon      Log-Likelihood:           -145.90
4  withcon         F-statistic:         4.071e+06
5  withcon  Prob (F-statistic):         1.55e-239
6  withcon               Scale:            1.1170

What you can do with this is only limited by your ability to use pandas - powerful Python data analysis toolkit.

Wombat answered 28/1, 2016 at 21:52 Comment(1)
Thanks, I thought statsmodels might have some ready-made functionality to make some regression tables, that's why I asked the question. The regression table I had in mind would be something like this link , but I guess I can twig the dataframe some how to make it similiar to this one. I'll add this link to the main question in case there is some ready way to make that kind of table.Sauerbraten
W
1

Here is a possible implementation:

import pandas as pd
def compare_statsmodels_ols(estimators, indice=0):
    if indice in [0, 2]:
        data_dict = {}
        if len(estimators) > 1:
            for k, est in estimators.iteritems():
                data_dict[k] = est.summary2().tables[indice].iloc[:, 1::2].stack().values

            index = estimators.popitem()[1].summary2().tables[indice].iloc[:, 0::2].stack().values
            df = pd.DataFrame.from_dict(data_dict)
            df.index = index
            return df

        else:
            raise 'waiting for a dictionnary for estimators parameter'
    else:
        raise 'Not working for the coeff table'   
estimators = {'m1': m1, 'm2': m2 }
compare_stats_models(estimators, 0)

with m1 and m2 being the prefitted models. This solution works only for the first(indice=0) and third (indice=2) ols summary tables.

output :

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th></th>      <th>m1</th>      <th>m2</th>    </tr>  </thead>  <tbody>    <tr>      <th>Model:</th>      <td>OLS</td>      <td>OLS</td>    </tr>    <tr>      <th>Adj. R-squared:</th>      <td>0.882</td>      <td>0.864</td>    </tr>    <tr>      <th>Dependent Variable:</th>      <td>Mpg</td>      <td>Mpg</td>    </tr>    <tr>      <th>AIC:</th>      <td>254.6367</td>      <td>273.3427</td>    </tr>    <tr>      <th>Date:</th>      <td>2016-12-14 16:28</td>      <td>2016-12-14 16:28</td>    </tr>    <tr>      <th>BIC:</th>      <td>389.3848</td>      <td>310.7728</td>    </tr>    <tr>      <th>No. Observations:</th>      <td>312</td>      <td>312</td>    </tr>    <tr>      <th>Log-Likelihood:</th>      <td>-91.318</td>      <td>-126.67</td>    </tr>    <tr>      <th>Df Model:</th>      <td>35</td>      <td>9</td>    </tr>    <tr>      <th>F-statistic:</th>      <td>67.12</td>      <td>220.9</td>    </tr>    <tr>      <th>Df Residuals:</th>      <td>276</td>      <td>302</td>    </tr>    <tr>      <th>Prob (F-statistic):</th>      <td>1.06e-114</td>      <td>3.28e-127</td>    </tr>    <tr>      <th>R-squared:</th>      <td>0.895</td>      <td>0.868</td>    </tr>    <tr>      <th>Scale:</th>      <td>0.11885</td>      <td>0.13624</td>    </tr>  </tbody></table>
Whitecap answered 14/12, 2016 at 15:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.