Summary statistics in glmnet
Asked Answered
S

2

5

I have been working on a data set and using glmnet for linear LASSO/Ridge regressions.

For the sake of simplicity, let's assume that the model I am using is the following:

cv.glmnet(train.features, train.response, alpha=1, nlambda=100, type.measure = "mse", nfolds = 10)

I'm preparing a presentation for a client and I need to show the T-stats of variables and R-squared values. In addition, I also need to plot the residuals against the fitted values of the model.

Before creating the functions to do this from scratch, I wanted to ask whether or not this is already covered in the library. I have checked the glmnet vignette but did not find anything.

Thanks for your help!

Selwyn answered 5/7, 2015 at 6:0 Comment(1)
glmnet is used for prediction not inference (although it does do a form of variable selection). I think there is still not an agreed method to generate standard errors, and the only way i have seen CI's is by doing bootstrapping (not included in glmnet). For the rsq, you could get the correlation between the observed and the predicted and square it - but this does not account for model complexityMattiematting
S
10

A partial answer to your question: The plotres function in the plotmo R package is an easy way to plot residuals for a wide variety of models, including glmnet and cv.glmnet models. The plotres vignette included with the package has details. For example

library(glmnet)
data(longley)
mod <- glmnet(data.matrix(longley[,1:6]), longley[,7])
library(plotmo) # for plotres
plotres(mod)

gives the following plot. You can select subplots and modify the plots by passing the appropriate arguments to plotres.

plot

Schoolmate answered 22/8, 2016 at 18:0 Comment(0)
B
0

The two packages "yardstick" and "modelr" can help.

I used caret to invoke glmnet via "train()" and the returned object has a $resample object that contains RMSE, Rsquared and MAE for each cross validation fold.

library( tictoc ) # If you don't want to install this, just take out the calls to tic() and toc()
library( caret )
library( tidyverse )

training_folds <- createFolds( dmv, returnTrain = TRUE )

ctl <- trainControl( method = "cv", number = 5, index = training_folds )
tic()
dmv_pp <- preProcess( dmv, method = c( "nzv", "center", "scale" ))
toc() # This can take a while

dmv_train <- predict( dmv_pp, dmv )
# Using just a subset of the data, because otherwise I run out of memory.
mdl <- train( duration_avg ~ ., data = dmv_train[1:1E4,], trControl = ctl,  method = "glmnet",
              tuneGrid = expand.grid(
                alpha = c( 0, 0.5, 1),
                lambda = c( 0.001, 0.01 )
              )
          )

mdl$resample %>% names()

mdl %>%
    listviewer::jsonedit() # This object should contain $resamples

dmv_train <- dmv_train %>%
    modelr::add_predictions( mdl, var = "predicted_duration_avg" ) # I think this should work with any model that has a predict() method

dmv_train %>%
  yardstick::metrics( duration_avg, predicted_duration_avg )
Brainpan answered 8/11, 2019 at 22:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.