Getting glmnet coefficients at 'best' lambda

Asked 1/6, 2015 at 4:3 Answered 3/2, 2020 at 21:30

I am using following code with glmnet:

> library(glmnet)
> fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1])
> plot(fit, xvar='lambda')

enter image description here

However, I want to print out the coefficients at best Lambda, like it is done in ridge regression. I see following structure of fit:

> str(fit)
List of 12
 $ a0       : Named num [1:79] 20.1 21.6 23.2 24.7 26 ...
  ..- attr(*, "names")= chr [1:79] "s0" "s1" "s2" "s3" ...
 $ beta     :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. ..@ i       : int [1:561] 0 4 0 4 0 4 0 4 0 4 ...
  .. ..@ p       : int [1:80] 0 0 2 4 6 8 10 12 14 16 ...
  .. ..@ Dim     : int [1:2] 10 79
  .. ..@ Dimnames:List of 2
  .. .. ..$ : chr [1:10] "cyl" "disp" "hp" "drat" ...
  .. .. ..$ : chr [1:79] "s0" "s1" "s2" "s3" ...
  .. ..@ x       : num [1:561] -0.0119 -0.4578 -0.1448 -0.7006 -0.2659 ...
  .. ..@ factors : list()
 $ df       : int [1:79] 0 2 2 2 2 2 2 2 2 3 ...
 $ dim      : int [1:2] 10 79
 $ lambda   : num [1:79] 5.15 4.69 4.27 3.89 3.55 ...
 $ dev.ratio: num [1:79] 0 0.129 0.248 0.347 0.429 ...
 $ nulldev  : num 1126
 $ npasses  : int 1226
 $ jerr     : int 0
 $ offset   : logi FALSE
 $ call     : language glmnet(x = as.matrix(mtcars[-1]), y = mtcars[, 1])
 $ nobs     : int 32
 - attr(*, "class")= chr [1:2] "elnet" "glmnet"

But I am not able to get the best Lambda and the corresponding coefficients. Thanks for your help.

Pewter answered 1/6, 2015 at 4:3 Comment(2)

@smci , you have a typo in your example. The sign should be on the from parameter e.g. lambda = 10^seq(from=-10, to=15, by=1/3) – Domineer 17/9, 2017 at 3:55

@smci do you have a citation for the advice? I can't find anything stating not to use the default lambda sequence. Although I understand why it might be good to supply a user-specified one, I was hoping for a source. – Minorca 18/2, 2021 at 9:15

Try this:

fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1], 
    lambda=cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1])$lambda.1se)
coef(fit)

Or you can specify a specify a lambda value in coef:

fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1])
coef(fit, s = cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1])$lambda.1se)

You need to pick a "best" lambda, and lambda.1se is a reasonable, or justifiable, one to pick. But you could use cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1])$lambda.min or any other value of lambda that you settle upon as "best" for you.

Alvarado answered 1/6, 2015 at 4:10 Comment(11)

log of lambda.min from cv.glmnet comes to -0.5. Is it all right if I mark this point on the x-axis of the plot(fit) from glmnet above? The log lambda indicated on x-axis of that plot is from the same vector from where lambda.min has come? – Pewter 1/6, 2015 at 4:51

The log lambda on the x-axis is from the same vector of lambda values that lambda.min came from. Just be aware that due to the nature of cross validation, you can get different values for lambda.min if you run cv.glmnet again. So, your mark on the x-axis would be the lambda.min from a particular call of cv.glmnet. – Alvarado 1/6, 2015 at 5:5

Will it be correct to say that if (width=lambda.1se - lambda.min), then (lambda.min +/- 1.96*width) will be the 95% confidence intervals for lambda.min or the area where the true regression model is likely to lie? – Pewter 1/6, 2015 at 5:28

I'm no expert, so call me out if I get something wrong here, but I don't think it would be right to say that there is a "true" regression model. The lambda.min value is the lambda that produces the lowest CV error, and the lambda.1se is the lambda giving a more regularized (i.e. more shrinkage) model that still is within 1 se of the minimum error. Anyway, have a look at plot(cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1])) to see that you're not dealing with a normal distribution. – Alvarado 1/6, 2015 at 6:5

One thing to note, as Frank says there will be some (or a lot) of variation in the minimum lambda if you rerun the cross-validation. ?cv.glmnet prompts with ` Note also that the results of cv.glmnet are random, since the folds are selected at random. Users can reduce this randomness by running cv.glmnet many times, and averaging the error curves.`. I rerun the the cv 100 times and average the curves and then find the minimum of this average curve (or 1se if you prefer). – Antiquate 1/6, 2015 at 16:39

Please change it to use lambda.1se instead of lambda.min – Tyrolienne 31/1, 2017 at 22:42

@Tyrolienne well, since you said "please." Thought it was fine as it was, since I showed using both lambda.min and lambda.1se and mentioned you have to pick what you consider "best." – Alvarado 1/2, 2017 at 2:12

@Antiquate et al: so what's the best practice, say we fit for 10 different random seeds? When getting the 'optimal' coefs, do we use the lambda.1se of the 'majority', or the individual lambda.1se of each fit? (e.g. in order to study variance in coeffts across multiple runs, for given alpha value?) – Tyrolienne 2/2, 2017 at 3:52

@Tyrolienne Maybe asking on Cross Validated is a good idea? I would say your idea sounds reasonable, but I would also say "best practice" may depend on your goals. You going for parsimony? predictive power? feature selection? – Alvarado 2/2, 2017 at 16:11

@smci, what i do is use one random seed: but use an outer loop so that the cv is run multiple (N) times. This produces N lambda by mse curves . Then I average the N ms'se across the curves at each lambda . Then find the lambda that minimises this averaged mse. – Antiquate 28/2, 2017 at 10:42

@Jota: good points. I didn't expect the answer would differ too much for each of those: for predictive power, use each individual lambda.1se, for parsimony, use the lambda.1se nearest to the mean or median (like @user2957945); for feature selection, can you please tell me what you've seen recommended? – Tyrolienne 28/2, 2017 at 22:55

To extract the optimal lambda, you could type fit$lambda.min

To obtain the coefficients corresponding to the optimal lambda, use coef(fit, s = fit$lambda.min) - please reference p.6 of the Glmnet vignette.

I think the coefficients are produced by a model fitted to the full data, not just testing sets, as mentioned in this page.

Sundaysundberg answered 3/2, 2020 at 21:30 Comment(3)

Welcome to SO! We can go beyond just answering the question at times by including a recommendation about how the asker can do even better. In this case, consider pointing the asker toward the 'glmnet' vignette or the cv.glmfit function which would help them find a value of lambda that will generalize better. – Psalter 3/2, 2020 at 21:39

Hey, if you do fit = glmnet(as.matrix(mtcars[-1]), mtcars[,1]), there is no fit$lambda.min. You only get that by calling cv.glmnet(as.matrix(mtcars[-1]), mtcars[,1]) – Natividadnativism 17/2, 2020 at 22:32

Also, if you read accepted answer above, lambda.min can be used, but lambda.1se is something commonly used because you try to choose a more parsimonious model. stats.stackexchange.com/questions/138569/… – Natividadnativism 17/2, 2020 at 22:42

-3

boxcox(){MASS} provides a maximum-likelihood plot showing which value of l provides the best fit in a linear model

boxcox(lm.fit) provides the maximum-likelihood plot for a wide range of l’s in the linear model

lm.fit pick the l with the highest ML value

boxcox(lm.fit,lambda=seq(-0.1, 0.1, 0.01)) if, for example, the highest l is around 0.04, get a zoomed in plot around that area

In the example, the function provides a plot between l =- 0.1 and 0.1 in 0.01 increments.

Pomace answered 22/2, 2018 at 2:12 Comment(0)

Recommended topics

Hot tags