Preserve variable names in summary from statsmodels

Asked 23/10, 2018 at 6:49 Answered 26/8, 2021 at 21:58

I am using OLS from statsmodel, the link is https://www.statsmodels.org/stable/examples/notebooks/generated/ols.html

#USD
X = sm.add_constant(USD)
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
                                 OLS Regression Results                                 
========================================================================================
Dep. Variable:     All Ordinaries closing price   R-squared:                       0.265
Model:                                      OLS   Adj. R-squared:                  0.265
Method:                           Least Squares   F-statistic:                     352.4
Date:                          Tue, 23 Oct 2018   Prob (F-statistic):           2.35e-67
Time:                                  17:30:24   Log-Likelihood:                -8018.8
No. Observations:                           977   AIC:                         1.604e+04
Df Residuals:                               975   BIC:                         1.605e+04
Df Model:                                     1                                         
Covariance Type:                      nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       1843.1414    149.675     12.314      0.000    1549.418    2136.864
USD         3512.5040    187.111     18.772      0.000    3145.318    3879.690
==============================================================================
Omnibus:                      276.458   Durbin-Watson:                   0.009
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               74.633
Skew:                           0.438   Prob(JB):                     6.22e-17
Kurtosis:                       1.967   Cond. No.                         10.7
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified

You can see the X is showing as USD in the summary which is what I want. However, after adding a new variable

#JPY + USD
X = sm.add_constant(JPY)
X = np.column_stack((X, USD))
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())


 OLS Regression Results                                 
========================================================================================
Dep. Variable:     All Ordinaries closing price   R-squared:                       0.641
Model:                                      OLS   Adj. R-squared:                  0.640
Method:                           Least Squares   F-statistic:                     868.8
Date:                          Tue, 23 Oct 2018   Prob (F-statistic):          2.80e-217
Time:                                  17:39:19   Log-Likelihood:                -7669.4
No. Observations:                           977   AIC:                         1.534e+04
Df Residuals:                               974   BIC:                         1.536e+04
Df Model:                                     2                                         
Covariance Type:                      nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const      -1559.5880    149.478    -10.434      0.000   -1852.923   -1266.253
x1            78.6589      2.466     31.902      0.000      73.820      83.497
x2          -366.5850    178.672     -2.052      0.040    -717.211     -15.958
==============================================================================
Omnibus:                       24.957   Durbin-Watson:                   0.031
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               27.278
Skew:                           0.353   Prob(JB):                     1.19e-06
Kurtosis:                       3.415   Cond. No.                         743.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

It is not showing USD and JPY, but x1 x2. Is there a way to fix it? I tried google but found nothing.

Despair answered 23/10, 2018 at 6:49 Comment(5)

Waht do USD and JPY actually mean? And what is their value when you add them as constants? – Forgo 23/10, 2018 at 9:58

The problem is that I found when I use np.column_stack, then it will just return values, so the header namely JPY and USD will not be included. However, I haven't find a way solving it – Despair 23/10, 2018 at 10:5

That wasn't my question; you're just restating your original question. – Forgo 23/10, 2018 at 10:14

Of course, it's a nicety of statsmodels to keep track of the names of constants and such where it can. When resorting to using NumPy arrays as input, statsmodels can't do that anymore. I'm doubtful you can retrieve the names manually, because if you could, it's likely that it would have been programmed into statsmodels already. – Forgo 23/10, 2018 at 10:15

The real question is: why does it matter? – Forgo 23/10, 2018 at 10:15

As my question is all care about the showing, thus, if I keep the header, then the problem solved, so I post my solution in case someone may have the same problem.

#JPY + USD
X = JPY.join(USD)
X = sm.add_constant(X)
#X = np.column_stack((X, USD))
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())


     OLS Regression Results                                 
========================================================================================
Dep. Variable:     All Ordinaries closing price   R-squared:                       0.641
Model:                                      OLS   Adj. R-squared:                  0.640
Method:                           Least Squares   F-statistic:                     868.8
Date:                          Tue, 23 Oct 2018   Prob (F-statistic):          2.80e-217
Time:                                  22:51:43   Log-Likelihood:                -7669.4
No. Observations:                           977   AIC:                         1.534e+04
Df Residuals:                               974   BIC:                         1.536e+04
Df Model:                                     2                                         
Covariance Type:                      nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const      -1559.5880    149.478    -10.434      0.000   -1852.923   -1266.253
JPY           78.6589      2.466     31.902      0.000      73.820      83.497
USD         -366.5850    178.672     -2.052      0.040    -717.211     -15.958
==============================================================================
Omnibus:                       24.957   Durbin-Watson:                   0.031
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               27.278
Skew:                           0.353   Prob(JB):                     1.19e-06
Kurtosis:                       3.415   Cond. No.                         743.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Despair answered 23/10, 2018 at 11:54 Comment(3)

Perhaps the real solution was that USD and JPY needed to be kept as pandas.Dataframes? – Jonas 23/10, 2018 at 12:4

yes they need to be pd.DataFrame. Thus, we can use join function. – Despair 23/10, 2018 at 12:6

You can also use R style formulas and it may be easier to read both in your code, and in the solution when you need to make adjustment to the variables such as taking a log value. – Consolation 26/10, 2018 at 14:56

Here's an easy fix using pandas. You only need to add a list of features inside summary().

# list of features (names)
features = list(df.iloc[:, 0:-1].columns) # exclude last column (label)

# scale features
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# train MLR model
regressor = sm.OLS(y_train, X_train).fit()

regressor.summary(xname=features)

Flaxman answered 26/8, 2021 at 21:58 Comment(0)

Recommended topics

Hot tags