When it comes to measuring goodness of fit - R-Squared seems to be a commonly understood (and accepted) measure for "simple" linear models.
But in case of statsmodels
(as well as other statistical software) RLM does not include R-squared together with regression results.
Is there a way to get it calculated "manually", perhaps in a way similar to how it is done in Stata?
Or is there another measure that can be used / calculated from the results produced by sm.RLS
?
This is what Statsmodels is producing:
import numpy as np
import statsmodels.api as sm
# Sample Data with outliers
nsample = 50
x = np.linspace(0, 20, nsample)
x = sm.add_constant(x)
sig = 0.3
beta = [5, 0.5]
y_true = np.dot(x, beta)
y = y_true + sig * 1. * np.random.normal(size=nsample)
y[[39,41,43,45,48]] -= 5 # add some outliers (10% of nsample)
# Regression with Robust Linear Model
res = sm.RLM(y, x).fit()
print(res.summary())
Which outputs:
Robust linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 50
Model: RLM Df Residuals: 48
Method: IRLS Df Model: 1
Norm: HuberT
Scale Est.: mad
Cov Type: H1
Date: Mo, 27 Jul 2015
Time: 10:00:00
No. Iterations: 17
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
const 5.0254 0.091 55.017 0.000 4.846 5.204
x1 0.4845 0.008 61.555 0.000 0.469 0.500
==============================================================================
wls_results = WLS(mod.endog, mod.exog, weights=mod.weights).fit()
wheremod
is the RLM model after fit. No guarantees for this. The rsquared of a WLS results has the rsquared for the weighted residuals which would be the measure that downweights the outliers. However, I don't think you can compare models by rsquared if they differ by the weights. – Partizanmod = sm.RLS(y, x); r2_wls = sm.WLS(mod.endog, mod.exog, weights=mod.fit().weights).fit().rsquared
. Compare to R2 ofOLS
=0.731. Looks like "too good to be true" :-) – NoughtOLS
... ButBIC
went down from 181 to 177 (is it a significant shift?). Is there other measure to prove that RLS clearly and numerically shows "better fit"? – Nought.scale
property. However I could not find any explanation on how to interpret this parameter and what it actually means. While searching I came across a couple of papers that might be of interest: Robust version of R-squared, Robust AIC – Nought