Nested resampling + LASSO (regr.cvglment) using mlr

Asked 22/6, 2018 at 21:0 Answered 2/7, 2018 at 22:10

Solved r resampling lasso-regression mlr

I am trying to conduct nested resampling with 10 CVs for the inner and 10 CVs for the outer loop using regr.cvglment. Mlr provides the code using a wrapper function (https://mlr-org.github.io/mlr/articles/tutorial/devel/nested_resampling.html)

Now, I just exchanged two things from their code provided 1) "regr.cvglmnet" instead of support vector machine (ksvm) 2) the number of iterations for both inner and outer loop

After the lrn function I get the error specified below. Could someone explain this to me? I am completely new to coding and machine learning so I might have done something pretty stupid in the code....

ps = makeParamSet(
  makeDiscreteParam("C", values = 2^(-12:12)),
  makeDiscreteParam("sigma", values = 2^(-12:12))
)
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("Subsample", iters = 10)
lrn = makeTuneWrapper("regr.cvglmnet", resampling = inner, par.set = ps, 
                      control = ctrl, show.info = FALSE)

# Error in checkTunerParset(learner, par.set, measures, control) : 
# Can only tune parameters for which learner parameters exist: C,sigma

### Outer resampling loop
outer = makeResampleDesc("CV", iters = 10) 
r = resample(lrn, iris.task, resampling = outer, extract = getTuneResult, 
             show.info = FALSE)

Scalade answered 22/6, 2018 at 21:0 Comment(0)

When using LASSO with glmnet, you only need to tune s. This is the important parameter that is used when the model predicts to new data. Parameter lambda has absolutely no influence due to the way the package is coded on the prediction. If you set s different to whatever lambda values have been chosen, the model will be refitted with s as the penalization term.

By default, several models with various lambda values are fitted during the train call. However, for prediction a new model will be fitted using the best lambda value. So in fact the tuning is done in the prediction step.

Good default ranges for s can be chosen by

Training the model with the defaults from glmnet
Check min and max values of lambda
Use these as lower and upper bounds for s that is then tuned using mlr

See also this discussion.

library(mlr)
#> Loading required package: ParamHelpers

lrn_glmnet <- makeLearner("regr.glmnet",
                          alpha = 1,
                          intercept = FALSE)

# check lambda
glmnet_train = mlr::train(lrn_glmnet, bh.task)
summary(glmnet_train$learner.model$lambda)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   143.5   157.4   172.8   174.3   189.6   208.1

# set limits
ps_glmnet <- makeParamSet(makeNumericParam("s", lower = 140, upper = 208))

# tune params in parallel using a grid search for simplicity
tune.ctrl = makeTuneControlGrid()
inner <- makeResampleDesc("CV", iters = 2)

configureMlr(on.learner.error = "warn", on.error.dump = TRUE)
library(parallelMap)
parallelStart(mode = "multicore", level = "mlr.tuneParams", cpus = 4,
              mc.set.seed = TRUE) # only parallelize the tuning
#> Starting parallelization in mode=multicore with cpus=4.
set.seed(12345)
params_tuned_glmnet = tuneParams(lrn_glmnet, task = bh.task, resampling = inner,
                                 par.set = ps_glmnet, control = tune.ctrl, 
                                 measure = list(rmse))
#> [Tune] Started tuning learner regr.glmnet for parameter set:
#>      Type len Def     Constr Req Tunable Trafo
#> s numeric   -   - 140 to 208   -    TRUE     -
#> With control class: TuneControlGrid
#> Imputation value: Inf
#> Mapping in parallel: mode = multicore; cpus = 4; elements = 10.
#> [Tune] Result: s=140 : rmse.test.rmse=17.9803086
parallelStop()
#> Stopped parallelization. All cleaned up.

# train the model on the whole dataset using the `s` value from the tuning

lrn_glmnet_tuned <- makeLearner("regr.glmnet",
                                alpha = 1,
                                s = 140,
                                intercept = FALSE)
#lambda = sort(seq(0, 5, length.out = 100), decreasing = T))
glmnet_train_tuned = mlr::train(lrn_glmnet_tuned, bh.task)

Created on 2018-07-03 by the reprex package (v0.2.0).

devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.0 (2018-04-23)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       Europe/Berlin               
#>  date     2018-07-03
#> Packages -----------------------------------------------------------------
#>  package      * version   date       source         
#>  backports      1.1.2     2017-12-13 CRAN (R 3.5.0) 
#>  base         * 3.5.0     2018-06-04 local          
#>  BBmisc         1.11      2017-03-10 CRAN (R 3.5.0) 
#>  bit            1.1-14    2018-05-29 cran (@1.1-14) 
#>  bit64          0.9-7     2017-05-08 CRAN (R 3.5.0) 
#>  blob           1.1.1     2018-03-25 CRAN (R 3.5.0) 
#>  checkmate      1.8.5     2017-10-24 CRAN (R 3.5.0) 
#>  codetools      0.2-15    2016-10-05 CRAN (R 3.5.0) 
#>  colorspace     1.3-2     2016-12-14 CRAN (R 3.5.0) 
#>  compiler       3.5.0     2018-06-04 local          
#>  data.table     1.11.4    2018-05-27 CRAN (R 3.5.0) 
#>  datasets     * 3.5.0     2018-06-04 local          
#>  DBI            1.0.0     2018-05-02 cran (@1.0.0)  
#>  devtools       1.13.6    2018-06-27 CRAN (R 3.5.0) 
#>  digest         0.6.15    2018-01-28 CRAN (R 3.5.0) 
#>  evaluate       0.10.1    2017-06-24 CRAN (R 3.5.0) 
#>  fastmatch      1.1-0     2017-01-28 CRAN (R 3.5.0) 
#>  foreach        1.4.4     2017-12-12 CRAN (R 3.5.0) 
#>  ggplot2        2.2.1     2016-12-30 CRAN (R 3.5.0) 
#>  git2r          0.21.0    2018-01-04 CRAN (R 3.5.0) 
#>  glmnet         2.0-16    2018-04-02 CRAN (R 3.5.0) 
#>  graphics     * 3.5.0     2018-06-04 local          
#>  grDevices    * 3.5.0     2018-06-04 local          
#>  grid           3.5.0     2018-06-04 local          
#>  gtable         0.2.0     2016-02-26 CRAN (R 3.5.0) 
#>  htmltools      0.3.6     2017-04-28 CRAN (R 3.5.0) 
#>  iterators      1.0.9     2017-12-12 CRAN (R 3.5.0) 
#>  knitr          1.20      2018-02-20 CRAN (R 3.5.0) 
#>  lattice        0.20-35   2017-03-25 CRAN (R 3.5.0) 
#>  lazyeval       0.2.1     2017-10-29 CRAN (R 3.5.0) 
#>  magrittr       1.5       2014-11-22 CRAN (R 3.5.0) 
#>  Matrix         1.2-14    2018-04-09 CRAN (R 3.5.0) 
#>  memoise        1.1.0     2017-04-21 CRAN (R 3.5.0) 
#>  memuse         4.0-0     2017-11-10 CRAN (R 3.5.0) 
#>  methods      * 3.5.0     2018-06-04 local          
#>  mlr          * 2.13      2018-07-01 local          
#>  munsell        0.5.0     2018-06-12 CRAN (R 3.5.0) 
#>  parallel       3.5.0     2018-06-04 local          
#>  parallelMap  * 1.3       2015-06-10 CRAN (R 3.5.0) 
#>  ParamHelpers * 1.11      2018-06-25 CRAN (R 3.5.0) 
#>  pillar         1.2.3     2018-05-25 CRAN (R 3.5.0) 
#>  plyr           1.8.4     2016-06-08 CRAN (R 3.5.0) 
#>  Rcpp           0.12.17   2018-05-18 cran (@0.12.17)
#>  rlang          0.2.1     2018-05-30 CRAN (R 3.5.0) 
#>  rmarkdown      1.10      2018-06-11 CRAN (R 3.5.0) 
#>  rprojroot      1.3-2     2018-01-03 CRAN (R 3.5.0) 
#>  RSQLite        2.1.1     2018-05-06 cran (@2.1.1)  
#>  scales         0.5.0     2017-08-24 CRAN (R 3.5.0) 
#>  splines        3.5.0     2018-06-04 local          
#>  stats        * 3.5.0     2018-06-04 local          
#>  stringi        1.2.3     2018-06-12 CRAN (R 3.5.0) 
#>  stringr        1.3.1     2018-05-10 CRAN (R 3.5.0) 
#>  survival       2.42-3    2018-04-16 CRAN (R 3.5.0) 
#>  tibble         1.4.2     2018-01-22 CRAN (R 3.5.0) 
#>  tools          3.5.0     2018-06-04 local          
#>  utils        * 3.5.0     2018-06-04 local          
#>  withr          2.1.2     2018-03-15 CRAN (R 3.5.0) 
#>  XML            3.98-1.11 2018-04-16 CRAN (R 3.5.0) 
#>  yaml           2.1.19    2018-05-01 CRAN (R 3.5.0)

Mccarley answered 2/7, 2018 at 22:10 Comment(0)

The error message is telling you that you can't tune parameters that mlr doesn't know about for this learner -- regr.cvglmnet doesn't have C and sigma parameters. You can get the parameters mlr knows about for a learner with the getLearnerParamSet() function:

> getLearnerParamSet(makeLearner("regr.cvglmnet"))
                          Type  len        Def                Constr Req
family                discrete    -   gaussian      gaussian,poisson   -
alpha                  numeric    -          1                0 to 1   -
nfolds                 integer    -         10              3 to Inf   -
type.measure          discrete    -        mse               mse,mae   -
s                     discrete    - lambda.1se lambda.1se,lambda.min   -
nlambda                integer    -        100              1 to Inf   -
lambda.min.ratio       numeric    -          -                0 to 1   -
standardize            logical    -       TRUE                     -   -
intercept              logical    -       TRUE                     -   -
thresh                 numeric    -      1e-07              0 to Inf   -
dfmax                  integer    -          -              0 to Inf   -
pmax                   integer    -          -              0 to Inf   -
exclude          integervector           -              1 to Inf   -
penalty.factor   numericvector           -                0 to 1   -
lower.limits     numericvector           -             -Inf to 0   -
upper.limits     numericvector           -              0 to Inf   -
maxit                  integer    -     100000              1 to Inf   -
type.gaussian         discrete    -          -      covariance,naive   -
fdev                   numeric    -      1e-05                0 to 1   -
devmax                 numeric    -      0.999                0 to 1   -
eps                    numeric    -      1e-06                0 to 1   -
big                    numeric    -    9.9e+35           -Inf to Inf   -
mnlam                  integer    -          5              1 to Inf   -
pmin                   numeric    -      1e-09                0 to 1   -
exmx                   numeric    -        250           -Inf to Inf   -
prec                   numeric    -      1e-10           -Inf to Inf   -
mxit                   integer    -        100              1 to Inf   -
                 Tunable Trafo
family              TRUE     -
alpha               TRUE     -
nfolds              TRUE     -
type.measure        TRUE     -
s                   TRUE     -
nlambda             TRUE     -
lambda.min.ratio    TRUE     -
standardize         TRUE     -
intercept           TRUE     -
thresh              TRUE     -
dfmax               TRUE     -
pmax                TRUE     -
exclude             TRUE     -
penalty.factor      TRUE     -
lower.limits        TRUE     -
upper.limits        TRUE     -
maxit               TRUE     -
type.gaussian       TRUE     -
fdev                TRUE     -
devmax              TRUE     -
eps                 TRUE     -
big                 TRUE     -
mnlam               TRUE     -
pmin                TRUE     -
exmx                TRUE     -
prec                TRUE     -
mxit                TRUE     -

You can use any of those parameters to define a valid parameter set for tuning this particular learner, for example:

ps = makeParamSet(
  makeDiscreteParam("family", values = c("gaussian", "poisson")),
  makeDiscreteParam("alpha", values = 0.1*0:10)
)

Tuxedo answered 22/6, 2018 at 21:46 Comment(3)

Thanks, Lars! How do I know which parameters to choose from the list though? When I do a simple cross-validation for LASSO I would tune lambda and fix certain parameters (e.g. alpha to 1 - so its Lasso and not ridge, nlambda to 100, family gaussian and type.measure to mse). Would it therefore make sense to include those parameters in the makeParamSet function (see below) or only lambda as that would be the only parameter I am tuning as opposed to fixing a priori? – Scalade 25/6, 2018 at 16:46

ps = makeParamSet( makeDiscreteParam("family", values = "gaussian"), makeDiscreteParam("alpha", values = 1), makeDiscreteParam ("nlambda", values = 100), makeDiscreteParam ("type.measure", values = "mse") – Scalade 25/6, 2018 at 16:46

If you know that a parameter needs to have a particular value, you don't need to include it. – Tuxedo 25/6, 2018 at 16:51

Recommended topics

Hot tags