Plotting a Cox PH model using ggforest in RStudio when a factor is stratified?

Asked 10/9, 2020 at 0:36 Answered 8/9, 2022 at 11:45

I am trying to find a way to make a forest plot of hazard ratios from a Cox-PH model when one of the model variables needs to be stratified. For a non-stratified model, the ggforest() function is excellent. Running some example code

library(survival)
library(survminer)

model <- coxph(Surv(time, status) ~ sex + rx + adhere,
               data = colon )
ggforest(model)

colon <- within(colon, {
  sex <- factor(sex, labels = c("female", "male"))
  differ <- factor(differ, labels = c("well", "moderate", "poor"))
  extent <- factor(extent, labels = c("submuc.", "muscle", "serosa", "contig."))
})
bigmodel <-
  coxph(Surv(time, status) ~ sex + rx + adhere + differ + extent + node4,
        data = colon )
ggforest(bigmodel)

produces this graph

non-stratified figure

However if I have to correct for non-proportionality with stratification

stratamodel <- coxph(Surv(time, status) ~ sex + strata(rx) + adhere + differ + extent + node4,
                     data = colon )
ggforest(stratamodel)

The following error message appears:

"Error in [.data.frame(data, , var) : undefined columns selected

In addition: Warning message:

In .get_data(model, data = data) : The data argument is not provided. Data will be extracted from model fit."

Any suggestions for how to get the information that ggforest needs from the strata model so that it can produce a plot? Thanks for your help!

Hitchcock answered 10/9, 2020 at 0:36 Comment(4)

You can check this: #59887995 – Erdei 10/9, 2020 at 0:50

thanks for the quick reply. The challenge is that answer is not really an option for the real data set I'm working with, where the stratified variable is a treatment with levels that needs to be corrected for in the model but not interpreted, necessitating the strata function. It seems ggforest is pretty much a wrapper for ggplot so I was wondering if there was a way to make it work. Unfortunately I'm more of an ecologist picking up r tricks as I go so am not sure if what I'm asking is feasible – Hitchcock 10/9, 2020 at 1:46

As you noted, ggforest is a wrapper, so coding with the underlying ggplot should certainly work. The issue here is that the pool of SO users who are familiar with both stratified models and ggplot2 may not be very large. If you can include a rough sketch of what the stratified model's plot should look like, your chances of getting help would probably increase. – Jae 10/9, 2020 at 10:38

thanks for the suggestion @Z.Lin, If graphing using the example code, the stratified model would look like the image in the original question, but with slightly different values due to the stratification and not including the rx treatment, which was stratified. – Hitchcock 10/9, 2020 at 17:40

I gather that the desired forest plot is one that simply skips the stratified RX variable in the model's formula. If so, we can simply insert an if clause inside, to ignore formula parts that don't correspond exactly to column names in the data (e.g. "strata(rx)" is not a column name).

Approach 1

If you are comfortable with R sufficiently to modify functions, run trace(ggforest, edit = TRUE) and replace the allTerms <- lapply(...) (around lines 10-25) in the pop-up window with the following version:

allTerms <- lapply(seq_along(terms), function(i) {
  var <- names(terms)[i]
  if(var %in% colnames(data)) {
    if (terms[i] %in% c("factor", "character")) {
      adf <- as.data.frame(table(data[, var]))
      cbind(var = var, adf, pos = 1:nrow(adf))
    }
    else if (terms[i] == "numeric") {
      data.frame(var = var, Var1 = "", Freq = nrow(data), 
                 pos = 1)
    }
    else {
      vars = grep(paste0("^", var, "*."), coef$term, 
                  value = TRUE)
      data.frame(var = vars, Var1 = "", Freq = nrow(data), 
                 pos = seq_along(vars))
    }
  } else {
    message(var, "is not found in data columns, and will be skipped.")
  }
  
})

ggforest(stratamodel) # this should work after the modification

The modification will not persist to subsequent R sessions. If you want to reverse the modification within the current session, simply run untrace(ggforest).

Approach 2

If you prefer to have a permanently modified version of the function for future use / don't want to muck around with a library's function, save the function below:

ggforest2 <- function (model, data = NULL, main = "Hazard ratio", 
                       cpositions = c(0.02, 0.22, 0.4), fontsize = 0.7, 
                       refLabel = "reference", noDigits = 2) {
  conf.high <- conf.low <- estimate <- NULL
  stopifnot(class(model) == "coxph")
  data <- survminer:::.get_data(model, data = data)
  terms <- attr(model$terms, "dataClasses")[-1]
  coef <- as.data.frame(broom::tidy(model))
  gmodel <- broom::glance(model)
  allTerms <- lapply(seq_along(terms), function(i) {
    var <- names(terms)[i]
    if(var %in% colnames(data)) {
      if (terms[i] %in% c("factor", "character")) {
        adf <- as.data.frame(table(data[, var]))
        cbind(var = var, adf, pos = 1:nrow(adf))
      }
      else if (terms[i] == "numeric") {
        data.frame(var = var, Var1 = "", Freq = nrow(data), 
                   pos = 1)
      }
      else {
        vars = grep(paste0("^", var, "*."), coef$term, 
                    value = TRUE)
        data.frame(var = vars, Var1 = "", Freq = nrow(data), 
                   pos = seq_along(vars))
      }
    } else {
      message(var, "is not found in data columns, and will be skipped.")
    }    
  })
  allTermsDF <- do.call(rbind, allTerms)
  colnames(allTermsDF) <- c("var", "level", "N", 
                            "pos")
  inds <- apply(allTermsDF[, 1:2], 1, paste0, collapse = "")
  rownames(coef) <- gsub(coef$term, pattern = "`", replacement = "")
  toShow <- cbind(allTermsDF, coef[inds, ])[, c("var", "level", "N", "p.value", 
                                                "estimate", "conf.low", 
                                                "conf.high", "pos")]
  toShowExp <- toShow[, 5:7]
  toShowExp[is.na(toShowExp)] <- 0
  toShowExp <- format(exp(toShowExp), digits = noDigits)
  toShowExpClean <- data.frame(toShow, pvalue = signif(toShow[, 4], noDigits + 1), 
                               toShowExp)
  toShowExpClean$stars <- paste0(round(toShowExpClean$p.value, noDigits + 1), " ", 
                                 ifelse(toShowExpClean$p.value < 0.05, "*", ""), 
                                 ifelse(toShowExpClean$p.value < 0.01, "*", ""), 
                                 ifelse(toShowExpClean$p.value < 0.001, "*", ""))
  toShowExpClean$ci <- paste0("(", toShowExpClean[, "conf.low.1"], 
                              " - ", toShowExpClean[, "conf.high.1"], ")")
  toShowExpClean$estimate.1[is.na(toShowExpClean$estimate)] = refLabel
  toShowExpClean$stars[which(toShowExpClean$p.value < 0.001)] = "<0.001 ***"
  toShowExpClean$stars[is.na(toShowExpClean$estimate)] = ""
  toShowExpClean$ci[is.na(toShowExpClean$estimate)] = ""
  toShowExpClean$estimate[is.na(toShowExpClean$estimate)] = 0
  toShowExpClean$var = as.character(toShowExpClean$var)
  toShowExpClean$var[duplicated(toShowExpClean$var)] = ""
  toShowExpClean$N <- paste0("(N=", toShowExpClean$N, ")")
  toShowExpClean <- toShowExpClean[nrow(toShowExpClean):1, ]
  rangeb <- range(toShowExpClean$conf.low, toShowExpClean$conf.high, 
                  na.rm = TRUE)
  breaks <- axisTicks(rangeb/2, log = TRUE, nint = 7)
  rangeplot <- rangeb
  rangeplot[1] <- rangeplot[1] - diff(rangeb)
  rangeplot[2] <- rangeplot[2] + 0.15 * diff(rangeb)
  width <- diff(rangeplot)
  y_variable <- rangeplot[1] + cpositions[1] * width
  y_nlevel <- rangeplot[1] + cpositions[2] * width
  y_cistring <- rangeplot[1] + cpositions[3] * width
  y_stars <- rangeb[2]
  x_annotate <- seq_len(nrow(toShowExpClean))
  annot_size_mm <- fontsize * 
    as.numeric(grid::convertX(unit(theme_get()$text$size, "pt"), "mm"))
  p <- ggplot(toShowExpClean, 
              aes(seq_along(var), exp(estimate))) + 
    geom_rect(aes(xmin = seq_along(var) - 0.5, 
                  xmax = seq_along(var) + 0.5, 
                  ymin = exp(rangeplot[1]), 
                  ymax = exp(rangeplot[2]), 
                  fill = ordered(seq_along(var)%%2 + 1))) +
    scale_fill_manual(values = c("#FFFFFF33",  "#00000033"), guide = "none") + 
    geom_point(pch = 15, size = 4) + 
    geom_errorbar(aes(ymin = exp(conf.low), ymax = exp(conf.high)), 
                  width = 0.15) + 
    geom_hline(yintercept = 1, linetype = 3) + 
    coord_flip(ylim = exp(rangeplot)) + 
    ggtitle(main) + 
    scale_y_log10(name = "", labels = sprintf("%g", breaks), 
                  expand = c(0.02, 0.02), breaks = breaks) + 
    theme_light() +
    theme(panel.grid.minor.y = element_blank(), 
          panel.grid.minor.x = element_blank(), 
          panel.grid.major.y = element_blank(), 
          legend.position = "none", 
          panel.border = element_blank(), 
          axis.title.y = element_blank(), 
          axis.text.y = element_blank(), 
          axis.ticks.y = element_blank(), 
          plot.title = element_text(hjust = 0.5)) + 
    xlab("") + 
    annotate(geom = "text", x = x_annotate, y = exp(y_variable), 
             label = toShowExpClean$var, fontface = "bold", 
             hjust = 0, size = annot_size_mm) + 
    annotate(geom = "text", x = x_annotate, y = exp(y_nlevel), hjust = 0, 
             label = toShowExpClean$level, 
             vjust = -0.1, size = annot_size_mm) + 
    annotate(geom = "text", x = x_annotate, y = exp(y_nlevel), 
             label = toShowExpClean$N, fontface = "italic", hjust = 0, 
             vjust = ifelse(toShowExpClean$level == "", 0.5, 1.1),
             size = annot_size_mm) + 
    annotate(geom = "text", x = x_annotate, y = exp(y_cistring), 
             label = toShowExpClean$estimate.1, size = annot_size_mm, 
             vjust = ifelse(toShowExpClean$estimate.1 == "reference", 0.5, -0.1)) + 
    annotate(geom = "text", x = x_annotate, y = exp(y_cistring), 
             label = toShowExpClean$ci, size = annot_size_mm, 
             vjust = 1.1, fontface = "italic") + 
    annotate(geom = "text", x = x_annotate, y = exp(y_stars), 
             label = toShowExpClean$stars, size = annot_size_mm, 
             hjust = -0.2, fontface = "italic") +
    annotate(geom = "text", x = 0.5, y = exp(y_variable), 
             label = paste0("# Events: ", gmodel$nevent, 
                            "; Global p-value (Log-Rank): ", 
                            format.pval(gmodel$p.value.log, eps = ".001"), 
                            " \nAIC: ", round(gmodel$AIC, 2), 
                            "; Concordance Index: ", round(gmodel$concordance, 2)), 
             size = annot_size_mm, hjust = 0, vjust = 1.2, fontface = "italic")
  gt <- ggplot_gtable(ggplot_build(p))
  gt$layout$clip[gt$layout$name == "panel"] <- "off"
  ggpubr::as_ggplot(gt)
}

It's a snapshot of the ggforest function as it currently stands, with the same modification as above. If the package's creator makes modifications to the package in the future, this can break or become outdated. But for now, ggforest2(stratamodel) will yield the same result as Approach 1.

Jae answered 11/9, 2020 at 3:37 Comment(1)

that is exactly what I needed and worked perfectly, thank you so much for the help! – Hitchcock 11/9, 2020 at 4:33

As this answer noted, ggforest looks at attr(model$terms, "dataClasses")[-1] to get the model terms.

So let's just give it what it wants?

library(dplyr)

ggforest(stratamodel, data = colon %>% mutate(`strata(rx)` = rx))

By cross-checking with the summary(stratamodel) output, this seems to be correct.

We are, unfortunately, left with the unnecessary rows for the strata. Perhaps there is a way to deal with this.

EDIT: There is!

We try a different strategy, we change the attributes of the model, so that the ggforest function doesn't search for these terms.

attr(stratamodel$terms, "dataClasses") <- attr(stratamodel$terms, "dataClasses")[-3]
ggforest(stratamodel, data = colon)

We have what we needed.

Discombobulate answered 8/9, 2022 at 11:45 Comment(2)

This answer doesn't work. – Flamingo 4/5, 2023 at 4:35

More information please? Run all the code from the question. – Discombobulate 10/5, 2023 at 13:53

Approach 1

Approach 2

Recommended topics

Hot tags