Preserve variable names in summary from statsmodels
Asked Answered
D

2

9

I am using OLS from statsmodel, the link is https://www.statsmodels.org/stable/examples/notebooks/generated/ols.html

#USD
X = sm.add_constant(USD)
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
                                 OLS Regression Results                                 
========================================================================================
Dep. Variable:     All Ordinaries closing price   R-squared:                       0.265
Model:                                      OLS   Adj. R-squared:                  0.265
Method:                           Least Squares   F-statistic:                     352.4
Date:                          Tue, 23 Oct 2018   Prob (F-statistic):           2.35e-67
Time:                                  17:30:24   Log-Likelihood:                -8018.8
No. Observations:                           977   AIC:                         1.604e+04
Df Residuals:                               975   BIC:                         1.605e+04
Df Model:                                     1                                         
Covariance Type:                      nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       1843.1414    149.675     12.314      0.000    1549.418    2136.864
USD         3512.5040    187.111     18.772      0.000    3145.318    3879.690
==============================================================================
Omnibus:                      276.458   Durbin-Watson:                   0.009
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               74.633
Skew:                           0.438   Prob(JB):                     6.22e-17
Kurtosis:                       1.967   Cond. No.                         10.7
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified

You can see the X is showing as USD in the summary which is what I want. However, after adding a new variable

#JPY + USD
X = sm.add_constant(JPY)
X = np.column_stack((X, USD))
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())


 OLS Regression Results                                 
========================================================================================
Dep. Variable:     All Ordinaries closing price   R-squared:                       0.641
Model:                                      OLS   Adj. R-squared:                  0.640
Method:                           Least Squares   F-statistic:                     868.8
Date:                          Tue, 23 Oct 2018   Prob (F-statistic):          2.80e-217
Time:                                  17:39:19   Log-Likelihood:                -7669.4
No. Observations:                           977   AIC:                         1.534e+04
Df Residuals:                               974   BIC:                         1.536e+04
Df Model:                                     2                                         
Covariance Type:                      nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const      -1559.5880    149.478    -10.434      0.000   -1852.923   -1266.253
x1            78.6589      2.466     31.902      0.000      73.820      83.497
x2          -366.5850    178.672     -2.052      0.040    -717.211     -15.958
==============================================================================
Omnibus:                       24.957   Durbin-Watson:                   0.031
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               27.278
Skew:                           0.353   Prob(JB):                     1.19e-06
Kurtosis:                       3.415   Cond. No.                         743.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

It is not showing USD and JPY, but x1 x2. Is there a way to fix it? I tried google but found nothing.

Despair answered 23/10, 2018 at 6:49 Comment(5)
Waht do USD and JPY actually mean? And what is their value when you add them as constants?Forgo
The problem is that I found when I use np.column_stack, then it will just return values, so the header namely JPY and USD will not be included. However, I haven't find a way solving itDespair
That wasn't my question; you're just restating your original question.Forgo
Of course, it's a nicety of statsmodels to keep track of the names of constants and such where it can. When resorting to using NumPy arrays as input, statsmodels can't do that anymore. I'm doubtful you can retrieve the names manually, because if you could, it's likely that it would have been programmed into statsmodels already.Forgo
The real question is: why does it matter?Forgo
D
7

As my question is all care about the showing, thus, if I keep the header, then the problem solved, so I post my solution in case someone may have the same problem.

#JPY + USD
X = JPY.join(USD)
X = sm.add_constant(X)
#X = np.column_stack((X, USD))
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())


     OLS Regression Results                                 
========================================================================================
Dep. Variable:     All Ordinaries closing price   R-squared:                       0.641
Model:                                      OLS   Adj. R-squared:                  0.640
Method:                           Least Squares   F-statistic:                     868.8
Date:                          Tue, 23 Oct 2018   Prob (F-statistic):          2.80e-217
Time:                                  22:51:43   Log-Likelihood:                -7669.4
No. Observations:                           977   AIC:                         1.534e+04
Df Residuals:                               974   BIC:                         1.536e+04
Df Model:                                     2                                         
Covariance Type:                      nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const      -1559.5880    149.478    -10.434      0.000   -1852.923   -1266.253
JPY           78.6589      2.466     31.902      0.000      73.820      83.497
USD         -366.5850    178.672     -2.052      0.040    -717.211     -15.958
==============================================================================
Omnibus:                       24.957   Durbin-Watson:                   0.031
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               27.278
Skew:                           0.353   Prob(JB):                     1.19e-06
Kurtosis:                       3.415   Cond. No.                         743.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Despair answered 23/10, 2018 at 11:54 Comment(3)
Perhaps the real solution was that USD and JPY needed to be kept as pandas.Dataframes?Jonas
yes they need to be pd.DataFrame. Thus, we can use join function.Despair
You can also use R style formulas and it may be easier to read both in your code, and in the solution when you need to make adjustment to the variables such as taking a log value.Consolation
F
7

Here's an easy fix using pandas. You only need to add a list of features inside summary().

# list of features (names)
features = list(df.iloc[:, 0:-1].columns) # exclude last column (label)

# scale features
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# train MLR model
regressor = sm.OLS(y_train, X_train).fit()

regressor.summary(xname=features)
Flaxman answered 26/8, 2021 at 21:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.