How to set the parameters grids correctly when tuning the workflowset with tidymodels?
Asked Answered
D

1

6

I try to use tidymodels to tune the workflow with recipe and model parameters. When tuning a single workflow there is no problem. But when tuning a workflowsets with several workflows it always fails. Here is my codes:

# read the training data
train <- read_csv("../../train.csv")
train <- train %>% 
    mutate(
      id = row_number(),
      across(where(is.double), as.integer),
      across(where(is.character), as.factor),
      r_yn = fct_relevel(r_yn, "yes")) %>% 
  select(id, r_yn, everything())

# setting the recipes

# no precess
rec_no <- recipe(r_yn ~ ., data = train) %>%
  update_role(id, new_role = "ID")

# downsample: tuning the under_ratio
rec_ds_tune <- rec_no %>% 
  step_downsample(r_yn, under_ratio = tune(), skip = TRUE, seed = 100) %>%
  step_nzv(all_predictors(), freq_cut = 100)

# setting the models

# randomforest
spec_rf_tune <- rand_forest(trees = 100, mtry = tune(), min_n = tune()) %>%
  set_engine("ranger", seed = 100) %>%
  set_mode("classification")

# xgboost
spec_xgb_tune <- boost_tree(trees = 100, mtry = tune(), tree_depth = tune(), learn_rate = tune(), min_n = tune()) %>% 
   set_engine("xgboost") %>% 
   set_mode("classification")

# setting the workflowsets
wf_tune_list <- workflow_set(
  preproc = list(no = rec_no, ds = rec_ds_tune),
  models = list(rf = spec_rf_tune, xgb = spec_xgb_tune),
  cross = TRUE)

# finalize the parameters, I'm not sure it is correct or not
rf_params <- spec_rf_tune %>% parameters() %>% update(mtry = mtry(c(1, 15)))
xgb_params <- spec_xgb_tune %>% parameters() %>% update(mtry = mtry(c(1, 15)))
ds_params <- rec_ds_tune %>% parameters() %>% update(under_ratio = under_ratio(c(1, 5)))

wf_tune_list_finalize <- wf_tune_list %>% 
  option_add(param = ds_params, id = c("ds_rf", "ds_xgb")) %>% 
  option_add(param = rf_params, id = c("no_rf", "ds_rf")) %>% 
  option_add(param = xgb_params, id = c("no_xgb", "ds_xgb"))

I check the option in wf_tune_list_finalize it shows:

> wf_tune_list_finalize$option
[[1]]
a list of options with names:  'param'

[[2]]
a list of options with names:  'param'

[[3]]
a list of options with names:  'param'

[[4]]
a list of options with names:  'param'

Then I tune this workflowset:

# tuning the workflowset
cl <- makeCluster(detectCores())
registerDoParallel(cl)
wf_tune_race <- wf_tune_list_finalize %>%
  workflow_map(fn = "tune_race_anova",
               seed = 100,
               resamples = cv_5,
               grid = 3,
               metrics = metric_auc,
               control = control_race(parallel_over = "everything"), 
               verbose = TRUE)
stopCluster(cl)

The verbose messages shows that there is something wrong with my parameters in the workflow ds_rf and ds_xgb:

i 1 of 4 tuning:     no_rf
i Creating pre-processing data to finalize unknown parameter: mtry
�� 1 of 4 tuning:     no_rf (1m 44.4s)
i 2 of 4 tuning:     no_xgb
i Creating pre-processing data to finalize unknown parameter: mtry
�� 2 of 4 tuning:     no_xgb (28.9s)
i 3 of 4 tuning:     ds_rf
x 3 of 4 tuning:     ds_rf failed with: Some tuning parameters require finalization but there are recipe parameters that require tuning. Please use `parameters()` to finalize the parameter ranges.
i 4 of 4 tuning:     ds_xgb
x 4 of 4 tuning:     ds_xgb failed with: Some tuning parameters require finalization but there are recipe parameters that require tuning. Please use `parameters()` to finalize the parameter ranges.

The result is:

> wf_tune_race
# A workflow set/tibble: 4 x 4
  wflow_id info             option      result        
  <chr>    <list>           <list>      <list>        
1 no_rf    <tibble [1 x 4]> <wrkflw__ > <race[+]>     
2 no_xgb   <tibble [1 x 4]> <wrkflw__ > <race[+]>     
3 ds_rf    <tibble [1 x 4]> <wrkflw__ > <try-errr [1]>
4 ds_xgb   <tibble [1 x 4]> <wrkflw__ > <try-errr [1]>

What's more, although the no_rf and no_xgb have tuning results, I find that the range of mtry in these two workflows is not the range I set above, that means the parameters range setting step is totally fail. I have followed the tutorials from https://www.tmwr.org/workflow-sets.html and https://workflowsets.tidymodels.org/ but still have no ideas.

So how to set both the recipe and model parameters correctly when tuning workflowsets?

The train.csv in my code is here: https://github.com/liuyifeikim/Some-data

Deft answered 30/7, 2021 at 10:49 Comment(4)
Following this post:tidyverse.org/blog/2021/03/workflowsets-0-0-1,I replace param with param_info in option_add(), after that, the range of mtry in no_rf and no_xgb is in accordance with my setting(1 to 15), but ds_rf and ds_xgb still fail, is there something wrong with rec_ds_tune?Deft
I believe this is a bug that was fixed in the recent CRAN release of finetune. Can you make sure you are using the version that was just released (or install from GitHub) and try again?Menard
@JuliaSilge Thank you, I have updated the packages and tried again (finetune = 0.10, tune = 0.1.6, workflowsets = 0.1.0), but maybe it is not the problem of finetune, I consider there is something wrong with my setting of option_add(), I find the order of option_add() will affect the result, if I try wf_tune_list %>% option_add(param_info = ds_params, id = "ds_rf") %>% option_add(param_info = rf_params, id = "ds_rf") , the rf_params will cover the ds_params, I still have no idea about how to add two cunstom parameter settings to the same workflow in a workflowset?Deft
Hmmmm, if you can create a small reprex and post this problem on the workflowsets repo, that would be very helpful.Menard
D
3

I have modified the parameter setting step, and the tuning result is correct now:

# setting the parameters on each workflow seperately
no_rf_params <- wf_set_tune_list %>% 
  extract_workflow("no_rf") %>% 
  parameters() %>% 
  update(mtry = mtry(c(1, 15)))

no_xgb_params <- wf_set_tune_list %>% 
  extract_workflow("no_xgb") %>% 
  parameters() %>% 
  update(mtry = mtry(c(1, 15)))

ds_rf_params <- wf_set_tune_list %>% 
  extract_workflow("ds_rf") %>% 
  parameters() %>% 
  update(mtry = mtry(c(1, 15)), under_ratio = under_ratio(c(1, 5)))

ds_xgb_params <- wf_set_tune_list %>% 
  extract_workflow("ds_xgb") %>% 
  parameters() %>% 
  update(mtry = mtry(c(1, 15)), under_ratio = under_ratio(c(1, 5)))

# update the workflowset
wf_set_tune_list_finalize <- wf_set_tune_list %>% 
  option_add(param_info = no_rf_params, id = "no_rf") %>%
  option_add(param_info = no_xgb_params, id = "no_xgb") %>% 
  option_add(param_info = ds_rf_params, id = "ds_rf") %>% 
  option_add(param_info = ds_xgb_params, id = "ds_xgb")

The rest remains the same. I think there may be some efficient ways to set the parameters.

Deft answered 2/8, 2021 at 10:33 Comment(1)
I tried to use parts of your but I get Warning message: parameters.workflow() was deprecated in tune 0.1.6.9003. Please use hardhat::extract_parameter_set_dials() instead.Pithecanthropus

© 2022 - 2024 — McMap. All rights reserved.