how to get standardised (Beta) coefficients for multiple linear regression using statsmodels
Asked Answered
A

3

12

when using the .summary() function using pandas statsmodels, the OLS Regression Results include the following fields.

coef    std err          t      P>|t|      [0.025      0.975]

How can I get the standardised coefficients (which exclude the intercept), similarly to what is achievable in SPSS?

Asbestosis answered 13/6, 2018 at 16:47 Comment(0)
A
12

You just need to standardize your original DataFrame using a z distribution (i.e., z-score) first and then perform a linear regression.

Assume you name your dataframe as df, which has independent variables x1, x2, and x3, and dependent variable y. Consider the following code:

import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.formula.api as smf

# standardizing dataframe
df_z = df.select_dtypes(include=[np.number]).dropna().apply(stats.zscore)

# fitting regression
formula = 'y ~ x1 + x2 + x3'
result = smf.ols(formula, data=df_z).fit()

# checking results
result.summary()

Now, the coef will show you the standardized (beta) coefficients so that you can compare their influence on your dependent variable.

Notes:

  1. Please keep in mind that you need .dropna(). Otherwise, stats.zscore will return all NaN for a column if it has any missing values.
  2. Instead of using .select_dtypes(), you can select column manually but make sure all the columns you selected are numeric.
  3. If you only care about the standardized (beta) coefficients, you can also use result.params to return it only. It will usually be displayed in a scientific-notation fashion. You can use something like round(result.params, 5) to round them.
Ane answered 12/2, 2019 at 14:14 Comment(0)
E
1

We can just transform the estimated params by the standard deviation of the exog. results.t_test(transformation) computes the parameter table for the linearly transformed variables.

AFAIR, the following should produce the beta coefficients and corresponding inferential statistics.

Compute standard deviation, but set it to 1 for the constant.

std = model.exog.std(0)
std[0] = 1

Then use results.t_test and look at the params_table. np.diag(std) creates a diagonal matrix that transforms the params.

tt = results.t_test(np.diag(std))
print(tt.summary()
tt.summary_frame()
Exempt answered 13/6, 2018 at 19:16 Comment(4)
what is the "model" here?Ramekin
model is any of the model instances, e.g. OLS or GLM. results is the corresponding Results instance returned by model.fit(). e.g. model = OLS(y, x) and results = model.fit()Exempt
di you standardize the response variable too>Orva
my mistake< the usual definition for linear model standardizes also y github.com/statsmodels/statsmodels/issues/…Exempt
C
0

you can convert unstandardized coefficients by taking std deviation. Standardized Coefficient (Beta) is the requirement for the driver analysis. Below is the code that works for me. X is independent variables and y is dependent variable and coefficients are coef which are extracted by (model.params) from ols.

sd_x = X.std()
sd_y = Y.std()
beta_coefficients = []

# Iterate through independent variables and calculate beta coefficients
for i, col in enumerate(X.columns):
    beta = coefficients[i] * (sd_x[col] / sd_y)
    beta_coefficients.append([col, beta])

# Print beta coefficients
for var, beta in beta_coefficients:
    print(f' {var}: {beta}')

Celanese answered 27/1, 2023 at 5:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.