When using LASSO with glmnet
, you only need to tune s
. This is the important parameter that is used when the model predicts to new data.
Parameter lambda
has absolutely no influence due to the way the package is coded on the prediction. If you set s
different to whatever lambda
values have been chosen, the model will be refitted with s
as the penalization term.
By default, several models with various lambda
values are fitted during the train
call. However, for prediction a new model will be fitted using the best lambda
value. So in fact the tuning is done in the prediction step.
Good default ranges for s
can be chosen by
- Training the model with the defaults from
glmnet
- Check min and max values of
lambda
- Use these as lower and upper bounds for
s
that is then tuned using mlr
See also this discussion.
library(mlr)
#> Loading required package: ParamHelpers
lrn_glmnet <- makeLearner("regr.glmnet",
alpha = 1,
intercept = FALSE)
# check lambda
glmnet_train = mlr::train(lrn_glmnet, bh.task)
summary(glmnet_train$learner.model$lambda)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 143.5 157.4 172.8 174.3 189.6 208.1
# set limits
ps_glmnet <- makeParamSet(makeNumericParam("s", lower = 140, upper = 208))
# tune params in parallel using a grid search for simplicity
tune.ctrl = makeTuneControlGrid()
inner <- makeResampleDesc("CV", iters = 2)
configureMlr(on.learner.error = "warn", on.error.dump = TRUE)
library(parallelMap)
parallelStart(mode = "multicore", level = "mlr.tuneParams", cpus = 4,
mc.set.seed = TRUE) # only parallelize the tuning
#> Starting parallelization in mode=multicore with cpus=4.
set.seed(12345)
params_tuned_glmnet = tuneParams(lrn_glmnet, task = bh.task, resampling = inner,
par.set = ps_glmnet, control = tune.ctrl,
measure = list(rmse))
#> [Tune] Started tuning learner regr.glmnet for parameter set:
#> Type len Def Constr Req Tunable Trafo
#> s numeric - - 140 to 208 - TRUE -
#> With control class: TuneControlGrid
#> Imputation value: Inf
#> Mapping in parallel: mode = multicore; cpus = 4; elements = 10.
#> [Tune] Result: s=140 : rmse.test.rmse=17.9803086
parallelStop()
#> Stopped parallelization. All cleaned up.
# train the model on the whole dataset using the `s` value from the tuning
lrn_glmnet_tuned <- makeLearner("regr.glmnet",
alpha = 1,
s = 140,
intercept = FALSE)
#lambda = sort(seq(0, 5, length.out = 100), decreasing = T))
glmnet_train_tuned = mlr::train(lrn_glmnet_tuned, bh.task)
Created on 2018-07-03 by the reprex package (v0.2.0).
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.5.0 (2018-04-23)
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> tz Europe/Berlin
#> date 2018-07-03
#> Packages -----------------------------------------------------------------
#> package * version date source
#> backports 1.1.2 2017-12-13 CRAN (R 3.5.0)
#> base * 3.5.0 2018-06-04 local
#> BBmisc 1.11 2017-03-10 CRAN (R 3.5.0)
#> bit 1.1-14 2018-05-29 cran (@1.1-14)
#> bit64 0.9-7 2017-05-08 CRAN (R 3.5.0)
#> blob 1.1.1 2018-03-25 CRAN (R 3.5.0)
#> checkmate 1.8.5 2017-10-24 CRAN (R 3.5.0)
#> codetools 0.2-15 2016-10-05 CRAN (R 3.5.0)
#> colorspace 1.3-2 2016-12-14 CRAN (R 3.5.0)
#> compiler 3.5.0 2018-06-04 local
#> data.table 1.11.4 2018-05-27 CRAN (R 3.5.0)
#> datasets * 3.5.0 2018-06-04 local
#> DBI 1.0.0 2018-05-02 cran (@1.0.0)
#> devtools 1.13.6 2018-06-27 CRAN (R 3.5.0)
#> digest 0.6.15 2018-01-28 CRAN (R 3.5.0)
#> evaluate 0.10.1 2017-06-24 CRAN (R 3.5.0)
#> fastmatch 1.1-0 2017-01-28 CRAN (R 3.5.0)
#> foreach 1.4.4 2017-12-12 CRAN (R 3.5.0)
#> ggplot2 2.2.1 2016-12-30 CRAN (R 3.5.0)
#> git2r 0.21.0 2018-01-04 CRAN (R 3.5.0)
#> glmnet 2.0-16 2018-04-02 CRAN (R 3.5.0)
#> graphics * 3.5.0 2018-06-04 local
#> grDevices * 3.5.0 2018-06-04 local
#> grid 3.5.0 2018-06-04 local
#> gtable 0.2.0 2016-02-26 CRAN (R 3.5.0)
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.5.0)
#> iterators 1.0.9 2017-12-12 CRAN (R 3.5.0)
#> knitr 1.20 2018-02-20 CRAN (R 3.5.0)
#> lattice 0.20-35 2017-03-25 CRAN (R 3.5.0)
#> lazyeval 0.2.1 2017-10-29 CRAN (R 3.5.0)
#> magrittr 1.5 2014-11-22 CRAN (R 3.5.0)
#> Matrix 1.2-14 2018-04-09 CRAN (R 3.5.0)
#> memoise 1.1.0 2017-04-21 CRAN (R 3.5.0)
#> memuse 4.0-0 2017-11-10 CRAN (R 3.5.0)
#> methods * 3.5.0 2018-06-04 local
#> mlr * 2.13 2018-07-01 local
#> munsell 0.5.0 2018-06-12 CRAN (R 3.5.0)
#> parallel 3.5.0 2018-06-04 local
#> parallelMap * 1.3 2015-06-10 CRAN (R 3.5.0)
#> ParamHelpers * 1.11 2018-06-25 CRAN (R 3.5.0)
#> pillar 1.2.3 2018-05-25 CRAN (R 3.5.0)
#> plyr 1.8.4 2016-06-08 CRAN (R 3.5.0)
#> Rcpp 0.12.17 2018-05-18 cran (@0.12.17)
#> rlang 0.2.1 2018-05-30 CRAN (R 3.5.0)
#> rmarkdown 1.10 2018-06-11 CRAN (R 3.5.0)
#> rprojroot 1.3-2 2018-01-03 CRAN (R 3.5.0)
#> RSQLite 2.1.1 2018-05-06 cran (@2.1.1)
#> scales 0.5.0 2017-08-24 CRAN (R 3.5.0)
#> splines 3.5.0 2018-06-04 local
#> stats * 3.5.0 2018-06-04 local
#> stringi 1.2.3 2018-06-12 CRAN (R 3.5.0)
#> stringr 1.3.1 2018-05-10 CRAN (R 3.5.0)
#> survival 2.42-3 2018-04-16 CRAN (R 3.5.0)
#> tibble 1.4.2 2018-01-22 CRAN (R 3.5.0)
#> tools 3.5.0 2018-06-04 local
#> utils * 3.5.0 2018-06-04 local
#> withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
#> XML 3.98-1.11 2018-04-16 CRAN (R 3.5.0)
#> yaml 2.1.19 2018-05-01 CRAN (R 3.5.0)