Extract the coefficients for the best tuning parameters of a glmnet model in caret
Asked Answered
S

1

12

I am running elastic net regularization in caret using glmnet.

I pass sequence of values to trainControl for alpha and lambda, then I perform repeatedcv to get the optimal tunings of alpha and lambda.

Here is an example where the optimal tunings for alpha and lambda are 0.7 and 0.5 respectively:

age     <- c(4, 8, 7, 12, 6, 9, 10, 14, 7, 6, 8, 11, 11, 6, 2, 10, 14, 7, 12, 6, 9, 10, 14, 7) 
gender  <-  make.names(as.factor(c(1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1)))
bmi_p   <- c(0.86, 0.45, 0.99, 0.84, 0.85, 0.67, 0.91, 0.29, 0.88, 0.83, 0.48, 0.99, 0.80, 0.85,
         0.50, 0.91, 0.29, 0.88, 0.99, 0.84, 0.80, 0.85, 0.88, 0.99) 
m_edu   <- make.names(as.factor(c(0, 1, 1, 2, 2, 3, 2, 0, 1, 1, 0, 1, 2, 2, 1, 2, 0, 1, 1, 2, 2, 0 , 1, 0)))
p_edu   <-  make.names(as.factor(c(0, 2, 2, 2, 2, 3, 2, 0, 0, 0, 1, 2, 2, 1, 3, 2, 3, 0, 0, 2, 0, 1, 0, 1)))
f_color <-  make.names(as.factor(c("blue", "blue", "yellow", "red", "red", "yellow", 
                   "yellow", "red", "yellow","blue", "blue", "yellow", "red", "red", "yellow", 
                   "yellow", "red", "yellow", "yellow", "red", "blue", "yellow", "yellow", "red")))
asthma <-  make.names(as.factor(c(1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1)))
x <- data.frame(age, gender, bmi_p, m_edu, p_edu, f_color, asthma)

tuneGrid <- expand.grid(alpha = seq(0, 1, 0.05), lambda = seq(0, 0.5, 0.05))
fitControl <- trainControl(method = 'repeatedcv', number = 3, repeats = 5, classProbs = TRUE, summaryFunction = twoClassSummary) 

set.seed(1352)
model.test <- caret::train(asthma ~ age + gender + bmi_p + m_edu + p_edu + f_color, data = x, method = "glmnet", 
                       family = "binomial", trControl = fitControl, tuneGrid = tuneGrid, 
                       metric = "ROC")

model.test$bestTune

My question?

When I run as.matrix(coef(model.test$finalModel)) which I would assume give me the coefficients corresponding to the best model, I get 100 different sets of coefficients.

So how do I get the coefficients corresponding to the best tuning?

I've seen this recommendation to get the best model coef(model.test$finalModel, model.test$bestTune$lambda) However, this returns NULL coefficients, and In any case, would only be returning the best tunings related to lambda, and not to alpha in addition.

EDIT:

After searching everywhere on the internet, all I can find now which points me in the direction of the correct answer is this blog post, which says that model.test$finalModel returns the model corresponding to the best alpha tuning, and coef(model.test$finalModel, model.caret$bestTune$lambda) returns the set of coefficients corresponding to the best values of lambda. If this is true then this is the answer to my question. However, as this is a single blog post, and I can't find anything else to back up this claim, I am still skeptical. Can anyone validate this claim that model.test$finalModel returns the model corresponding to the best alpha?? If so then this question would be solved. Thanks!

Seaton answered 3/1, 2018 at 14:45 Comment(0)
B
8

After a bit of playing with your code I find it very odd that glmnet train chooses different lambda ranges depending on the seed. Here is an example:

library(caret)
library(glmnet)
set.seed(13)
model.test <- caret::train(asthma ~ age + gender + bmi_p + m_edu + p_edu + f_color, data = x, method = "glmnet", 
                           family = "binomial", trControl = fitControl, tuneGrid = tuneGrid, 
                           metric = "ROC")

c(head(model.test$finalModel$lambda, 5), tail(model.test$finalModel$lambda, 5))
#output
 [1] 3.7796447301 3.4438715094 3.1379274562 2.8591626295 2.6051625017 0.0005483617 0.0004996468 0.0004552595 0.0004148155
[10] 0.0003779645

optimum lambda is:

model.test$finalModel$lambdaOpt
#output
#[1] 0.05

and this works:

coef(model.test$finalModel, model.test$finalModel$lambdaOpt)
#12 x 1 sparse Matrix of class "dgCMatrix"
                        1
(Intercept)   -0.03158974
age            0.03329806
genderX1      -1.24093677
bmi_p          1.65156913
m_eduX1        0.45314106
m_eduX2       -0.09934991
m_eduX3       -0.72360297
p_eduX1       -0.51949828
p_eduX2       -0.80063642
p_eduX3       -2.18231433
f_colorred     0.87618211
f_coloryellow -1.52699254

giving the coefficients at best alpha and lambda

when using this model to predict some y are predicted as X1 and some as X2

 [1] X1 X1 X0 X1 X1 X0 X0 X1 X1 X1 X0 X1 X1 X1 X0 X0 X0 X1 X1 X1 X1 X0 X1 X1
Levels: X0 X1

now with the seed you used

set.seed(1352)
model.test <- caret::train(asthma ~ age + gender + bmi_p + m_edu + p_edu + f_color, data = x, method = "glmnet", 
                           family = "binomial", trControl = fitControl, tuneGrid = tuneGrid, 
                           metric = "ROC")

c(head(model.test$finalModel$lambda, 5), tail(model.test$finalModel$lambda, 5))
#output
 [1] 2.699746e-01 2.459908e-01 2.241377e-01 2.042259e-01 1.860830e-01 3.916870e-05 3.568906e-05 3.251854e-05 2.962968e-05
[10] 2.699746e-05

lambda values are 10 times smaller and this gives empty coefficients since lambdaOpt is not in the range of tested lambda:

coef(model.test$finalModel, model.test$finalModel$lambdaOpt)
#output
12 x 1 sparse Matrix of class "dgCMatrix"
              1
(Intercept)   .
age           .
genderX1      .
bmi_p         .
m_eduX1       .
m_eduX2       .
m_eduX3       .
p_eduX1       .
p_eduX2       .
p_eduX3       .
f_colorred    .
f_coloryellow .

model.test$finalModel$lambdaOpt
#output
0.5

now when predicting upon this model only X0 is predicted (the first level):

predict(model.test, x)
#output
 [1] X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0 X0
Levels: X0 X1

quite odd behavior, probably worth reporting

Bent answered 5/1, 2018 at 13:57 Comment(6)
Thanks! That is really weird... regarding the coefficients for lambda and alpha, how does one know that coef(model.test$finalModel, model.test$finalModel$lambdaOpt) gives the coefs corresponding to the best alpha in addition to lambda? is that explicitly stated somewhere? It's strange and seems arbitrary that finalModel would provide the coefficients for all sub-optimal fits of lambda but not the coefficients for all sub-optimal fits of alpha.Seaton
@steve zissou the 100 lambda behavior is most likely caused by default glmnet behavior. In glmnet by default the model is fit at 100 different lambda at any specified alpha. When tuning hyper parameters caret makes the final model using the best parameter combination for any model. It seems to me that caret passes the best alpha to build the final model and then when predicting from that model it passes the best lambda just as glmnet expects . Btw I updated the post with some relevant info like the predictions from the two models.Bent
You can try to dig in the code and check, but to me it seems the fit is made by just passing the alpha parameter. while the predict is made using predict(modelFit, newdata, s = modelFit$lambdaOpt) the lambdaOpt. I think this question will have to wait for topepo.Bent
For a data set with 24 samples, I don't think that it is suprising that you get different results across different seeds. Try using repeated CV to drive down the variation.Allotrope
Yes, glmnet can make predictions on almost any lambda for a fixed value of alpha (see this article for more info). The lambdaOpt component attached to the model fit is a side-effect of how glmnet works and ensures that the lambda found during the tuning process is used for prediction.Allotrope
Also, you can adapt the code in getModelInfo("glmnet", FALSE)[[1]]$varImp to get the coefficientsAllotrope

© 2022 - 2024 — McMap. All rights reserved.