Two-level stacked learner (enseble model) combining elastic net and logistic regression using mlr3
Asked Answered
G

1

0

I try to solve a common problem in medicine: the combination of a prediction model with other sources, eg, an expert opinion [sometimes heavily emphysised in medicine], called superdoc predictor in this post.

This could be solved by stack a model with a logistic regression (that enters the expert opinion) as described on page 26 in this paper:

Afshar P, Mohammadi A, Plataniotis KN, Oikonomou A, Benali H. From Handcrafted to Deep-Learning-Based Cancer Radiomics: Challenges and Opportunities. IEEE Signal Process Mag 2019; 36: 132–60. Available here

I've tried this here without considering overfitting (I did not apply out of fold predictions of the lower learner):

Example data

# library
library(tidyverse)
library(caret)
library(glmnet)
library(mlbench)

# get example data
data(PimaIndiansDiabetes, package="mlbench")
data <- PimaIndiansDiabetes

# add the super doctors opinion to the data
set.seed(2323)
data %>% 
  rowwise() %>% 
  mutate(superdoc=case_when(diabetes=="pos" ~ as.numeric(sample(0:2,1)), TRUE~ 0)) -> data

# separate the data in a training set and test set
train.data <- data[1:550,]
test.data <- data[551:768,]

Stacked models without considering out of fold predictions:

# elastic net regression (without the superdoc's opinion)
set.seed(2323)
model <- train(
  diabetes ~., data = train.data %>% select(-superdoc), method = "glmnet",
  trControl = trainControl("repeatedcv",
                           number = 10,
                           repeats=10,
                           classProbs = TRUE,
                           savePredictions = TRUE,
                           summaryFunction = twoClassSummary),
  tuneLength = 10,
  metric="ROC" #ROC metric is in twoClassSummary
)


# extract the coefficients for the best alpha and lambda  
coef(model$finalModel, model$finalModel$lambdaOpt) -> coeffs
tidy(coeffs) %>% tibble() -> coeffs

coef.interc = coeffs %>% filter(row=="(Intercept)") %>% pull(value)
coef.pregnant = coeffs %>% filter(row=="pregnant") %>% pull(value)
coef.glucose = coeffs %>% filter(row=="glucose") %>% pull(value)
coef.pressure = coeffs %>% filter(row=="pressure") %>% pull(value)
coef.mass = coeffs %>% filter(row=="mass") %>% pull(value)
coef.pedigree = coeffs %>% filter(row=="pedigree") %>% pull(value)
coef.age = coeffs %>% filter(row=="age") %>% pull(value)


# combine the model with the superdoc's opinion in a logistic regression model
finalmodel = glm(diabetes ~ superdoc + I(coef.interc + coef.pregnant*pregnant + coef.glucose*glucose + coef.pressure*pressure + coef.mass*mass + coef.pedigree*pedigree + coef.age*age),family=binomial, data=train.data)


# make predictions on the test data
predict(finalmodel,test.data, type="response") -> predictions


# check the AUC of the model in the test data
roc(test.data$diabetes,predictions, ci=TRUE) 
#> Setting levels: control = neg, case = pos
#> Setting direction: controls < cases
#> 
#> Call:
#> roc.default(response = test.data$diabetes, predictor = predictions,     ci = TRUE)
#> 
#> Data: predictions in 145 controls (test.data$diabetes neg) < 73 cases (test.data$diabetes pos).
#> Area under the curve: 0.9345
#> 95% CI: 0.8969-0.9721 (DeLong)

Now I would like to consider out of fold predictions using the mlr3 package family according to this very helpful post: Tuning a stacked learner

#library
library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
library(mlr3filters)
library(mlr3tuning)
library(paradox)
library(glmnet)

# creat elastic net regression
glmnet_lrn =  lrn("classif.cv_glmnet", predict_type = "prob")

# create the learner out-of-bag predictions
glmnet_cv1 = po("learner_cv", glmnet_lrn, id = "glmnet") #I could not find a setting to filter the predictors (ie, not send the superdoc predictor here)

# summarize steps 
level0 = gunion(list(
  glmnet_cv1,
  po("nop", id = "only_superdoc_predictor")))  %>>% #I could not find a setting to send only the superdoc predictor to "union1"
  po("featureunion", id = "union1")


# final logistic regression
log_reg_lrn = lrn("classif.log_reg", predict_type = "prob")

# combine ensemble model
ensemble = level0 %>>% log_reg_lrn
ensemble$plot(html = FALSE)

Created on 2021-03-15 by the reprex package (v1.0.0)

My question (I am rather new to the mlr3 package family)

  1. is the mlr3 package family well suited for the ensemble model I try to build?
  2. if yes, how cold I finalize the ensemle model and make predictions on the test.data
Grocer answered 15/3, 2021 at 19:21 Comment(0)
T
3

I think mlr3 / mlr3pipelines is well suited for your task. It appears that what you are missing is mainly the PipeOpSelect / po("select"), which lets you extract features based on their name or other properties and makes use of Selector objects. Your code should probably look something like

library("mlr3")
library("mlr3pipelines")
library("mlr3learners")

# creat elastic net regression
glmnet_lrn = lrn("classif.cv_glmnet", predict_type = "prob")

# create the learner out-of-bag predictions
glmnet_cv1 = po("learner_cv", glmnet_lrn, id = "glmnet")

# PipeOp that drops 'superdoc', i.e. selects all except 'superdoc'
# (ID given to avoid ID clash with other selector)
drop_superdoc = po("select", id = "drop.superdoc",
  selector = selector_invert(selector_name("superdoc")))

# PipeOp that selects 'superdoc' (and drops all other columns)
select_superdoc = po("select", id = "select.superdoc",
  selector = selector_name("superdoc"))

# superdoc along one path, the fitted model along the other
stacking_layer = gunion(list(
  select_superdoc,
  drop_superdoc %>>% glmnet_cv1
)) %>>% po("featureunion", id = "union1")

# final logistic regression
log_reg_lrn = lrn("classif.log_reg", predict_type = "prob")

# combine ensemble model
ensemble = stacking_layer %>>% log_reg_lrn

This is what it looks like:

ensemble$plot(html = FALSE)

The stacking graph.

To train and evaluate the model, we need to create Task objects:

train.task <- TaskClassif$new("train.data", train.data, target = "diabetes")
test.task <- TaskClassif$new("test.data", test.data, target = "diabetes")

The model can now be trained, can then be used for prediction, and the quality of the prediction can be evaluated. This works best if we turn the ensemble into a Learner:

elearner = as_learner(ensemble)
# Train the Learner:
elearner$train(train.task)
# (The training may give a warning because the glm gets the colinear features:
# The positive and the negative probabilities)

Get the prediction on the test set:

prediction = elearner$predict(test.task)
print(prediction)
#> <PredictionClassif> for 218 observations:
#>     row_ids truth response  prob.neg   prob.pos
#>           1   neg      neg 0.9417067 0.05829330
#>           2   neg      neg 0.9546343 0.04536566
#>           3   neg      neg 0.9152019 0.08479810
#> ---                                            
#>         216   neg      neg 0.9147406 0.08525943
#>         217   pos      neg 0.9078216 0.09217836
#>         218   neg      neg 0.9578515 0.04214854

The prediction was made on a Task, so it can be used directly measure performance against ground truth, e.g. using the "classif.auc" Measure:

msr("classif.auc")$score(prediction)
#> [1] 0.9308455

Two notes here:

  1. You have split up your data into training and test set manually. mlr3 gives you the possibility to do resampling automatically, based on a single Task object. This can then go beyond simple train-test splits. Using the data from the question, and doing a 10-fold cross-validation would look like this:
    all.task <- TaskClassif$new("all.data", data, target = "diabetes")
    rr = resample(all.task, elearner, rsmp("cv"))  # will take some time
    rr$aggregate(msr("classif.auc"))
    #> classif.auc 
    #>   0.9366438
    
  2. I have shown how to construct the graph using the po("select") PipeOps, because it is fully general: You can choose to have some feature both in the glmnet_lrn Learner, as well as in the log_reg_lrn directly, by playing around with the selector values. If all you want to do is really to "divert" a feature from a single operation, you can also use the affect_columns to a Selector that selects the column you want. The following creates a (linear) graph that does exactly the same, but is less flexible:
    glmnet_cv1_nosuperdoc = po("learner_cv", glmnet_lrn, id = "glmnet",
      affect_columns = selector_invert(selector_name("superdoc")))
    ensemble2 = glmnet_cv1_nosuperdoc %>>% log_reg_lrn
    e2learner = as_learner(ensemble2)
    # etc.
    
Tetrahedron answered 17/3, 2021 at 0:10 Comment(5)
Thank you for this marvellous solution including cv. The mlr3 package familiy is very helpful. I read that classif.cv_glmnet does an internal 10-fold CV optimization. Can this somehow be repeated (repeatedcv) within the solution you proposed (as in my caret::trainControl approach) to avoid variability in smaller data sets? I think tuning both, alpha and lambda using mlr3 is not (yet) possible?Grocer
AIUI your caret code essentially does hyperparameter tuning integrated to the glmnet learner. With mlr3, you would use AutoTuner, which wraps a learner and tunes it internally. You can specify to use rsmp("repeated_cv") (does 10 reps 10 folds by default). Caret's default tuner seems to be equivalent to mlr3 tnr("grid_search", resolution = 3) (but consider using random search). Note that po("learner_cv") also does another CV: to give out-of-fold predictions for unbiased training of the "classif.log_reg" (unrelated to tuning).Tetrahedron
Thank you for your comments. Pat-s suggested not to tune cv.glmnet in mlr3. Holds that true for my question? Since this problem is not really part of my initial question, I opened a new question hereGrocer
It is a complicated story. Pat-s is right in that cvglmnet tunes the lambda/s parameter internally. AFAIK it does not tune alpha. So I think you could tune alpha and lambda using glmnet (which I think didn't exist in mlr3 one year ago when that Q was asked), or you could alternatively tune just alpha on cvglmnet (which does its own tuning of lambda internally). These are both not ideal solutions (cvglmnet makes some optimizations that glmnet doesn't, but OTOH this nests CV twice and can't do repcv AFAIK). I don't know glmnet well enough to know if there is a better way.Tetrahedron
Thanks to your comments I could solve my question related to variabilityGrocer

© 2022 - 2024 — McMap. All rights reserved.