Different Linear Regression Coefficients with statsmodels and sklearn

I was planning to use sklearn linear_model to plot a graph of linear regression result, and statsmodels.api to get a detail summary of the learning result. However, the two packages produce very different results on the same input.

For example, the constant term from sklearn is 7.8e-14, but the constant term from statsmodels is 48.6. (I added a column of 1's in x for constant term when using both methods) My code for both methods are succint:

# Use statsmodels linear regression to get a result (summary) for the model.
def reg_statsmodels(y, x):
    results = sm.OLS(y, x).fit()
    return results

# Use sklearn linear regression to compute the coefficients for the prediction.
def reg_sklearn(y, x):
    lr = linear_model.LinearRegression()
    lr.fit(x, y)
    return lr.coef_

The input is too complicated to post here. Is it possible that a singular input x caused this problem?

By making a 3-d plot using PCA, it seems that the sklearn result is not a good approximation. What are some explanations? I still want to make a visualization, so it will be very helpful to fix the issues in the sklearn linear regression implementation.

Recommended topics

Hot tags