How to compute AIC for linear regression model in Python?
Asked Answered
P

2

18

I want to compute AIC for linear models to compare their complexity. I did it as follows:

regr = linear_model.LinearRegression()
regr.fit(X, y)

aic_intercept_slope = aic(y, regr.coef_[0] * X.as_matrix() + regr.intercept_, k=1)

def aic(y, y_pred, k):
   resid = y - y_pred.ravel()
   sse = sum(resid ** 2)

   AIC = 2*k - 2*np.log(sse)

return AIC

But I receive a divide by zero encountered in log error.

Projectionist answered 11/7, 2017 at 11:58 Comment(0)
D
21

sklearn's LinearRegression is good for prediction but pretty barebones as you've discovered. (It's often said that sklearn stays away from all things statistical inference.)

statsmodels.regression.linear_model.OLS has a property attribute AIC and a number of other pre-canned attributes.

However, note that you'll need to manually add a unit vector to your X matrix to include an intercept in your model.

from statsmodels.regression.linear_model import OLS
from statsmodels.tools import add_constant

regr = OLS(y, add_constant(X)).fit()
print(regr.aic)

Source is here if you are looking for an alternative way to write manually while still using sklearn.

Dunnite answered 11/7, 2017 at 14:6 Comment(0)
L
1

Here is an example implementation of AIC from the link that was given in the previous answer.

from statsmodels.regression.linear_model import OLS
from statsmodels.tools import add_constant
import numpy as np
from sklearn import linear_model

x = np.array([1,2,3,4,5])
y = np.array([0.3, 0.4, 0.4, 0.5, 0.6])

# 1 feature and constant
p = 1+1 

lr = linear_model.LinearRegression()
lr = lr.fit(x.reshape(-1, 1), y)
pr = lr.predict(x.reshape(-1, 1))

def llf_(y, X, pr):
    # return maximized log likelihood
    nobs = float(X.shape[0])
    nobs2 = nobs / 2.0
    nobs = float(nobs)
    resid = y - pr
    ssr = np.sum((resid)**2)
    llf = -nobs2*np.log(2*np.pi) - nobs2*np.log(ssr / nobs) - nobs2
    return llf

def aic(y, X, pr, p):
    # return aic metric
    llf = llf_(y, X, pr)
    return -2*llf+2*p

regr = OLS(y, add_constant(x.reshape(-1, 1))).fit()
print(regr.aic)
print(aic(y, x, pr, p))

Output:

-18.903519181693923 # OLS AIC
-18.903519181693916 # our AIC
Lumpfish answered 29/4, 2023 at 8:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.