ggplots stored in plot list to respect variable values at time of plot generation within for loop
Asked Answered
X

2

2

I have an elaborate plot routine that generates box plots with additional layers of scatter and adds them to a plot list.

The routine generates correct plots if they are created during the for loop directly via print(current_plot_complete).

However, if they are added to a plot list during the for loop which is printed only at the end, then the plots are incorrect: the final indices are used to generate all plots (instead of the current index at the time the plot is generated). This seems to be default ggplot2 behavior and I am looking for a solution to circumvent it in the current use case.

The issue seems to be within y = eval(parse(text=(paste0(COL_i)))) where the global environment is used (and thus the final index value) instead of the current values at the time of loop execution.

I tried various approaches to make eval() use the correct variable values, e.g. local(…) or specifying the environment – but without success.

A very simplified MWE is provided below.

enter image description here

MWE

The original routine is much more elaborate than this MWE such that the for loop can not be replaced easily with members of the apply family.

# create some random data
data_temp <- data.frame(
"a" = sample(x = 1:100, size  = 50),
"b" = rnorm(n = 50, mean = 45, sd = 1),
"c" = sample(x = 20:70, size  = 50), 
"d" = rnorm(n = 50, mean = 40, sd = 15),
"e" = rnorm(n = 50, mean = 50, sd = 10),
"f" = rnorm(n = 50, mean = 45, sd = 1),
"g" = sample(x = 20:70, size  = 50)
)
COLs_current <- c("a", "b", "c", "d", "e") # define COLs of data to include in box plots
choice_COLs <- c("a", "d")      # define COLs of data to add scatter to

plot_list <- list(NA)
plot_index <- 1

for (COL_i in choice_COLs) {

  COL_i_index <- which(COL_i == COLs_current)

  # Generate "basis boxplot" (to plot scatterplot on top)
  boxplot_scores <- data_temp %>% 
    gather(COL, score, all_of(COLs_current)) %>%
    ggplot(aes(x = COL, y = score)) +
    geom_boxplot() 

  # Get relevant data of COL_i for scattering: data of 4th quartile
  quartile_values <- quantile(data_temp[[COL_i]])
  threshold <- quartile_values["75%"]           # threshold = 3. quartile value
  data_temp_filtered <- data_temp %>%
    filter(data_temp[[COL_i]] > threshold) %>%  # filter the data of the 4th quartile
    dplyr::select(COLs_current)                 

  # Create layer of scatter for 4th quartile of COL_i
  scatter_COL_i <- geom_point(data=data_temp_filtered, mapping = aes(x = COL_i_index, y = eval(parse(text=(paste0(COL_i))))), color= "orange")

  # add geom objects to create final plot for COL_i
  current_plot_complete <- boxplot_scores + scatter_COL_i 

  print(current_plot_complete)

  plot_list[[plot_index]] <- current_plot_complete 
  plot_index <- plot_index + 1
}

plot_list
Xanthin answered 17/6, 2020 at 7:32 Comment(0)
R
1

I propose this solution which doesn't tell you why it doesn't work like you do :

l <- lapply(choice_COLs, temporary_function)

temporary_function <- function(COL_i){
    COL_i_index <- which(COL_i == COLs_current)

    # Generate "basis boxplot" (to plot scatterplot on top)
    boxplot_scores <- data_temp %>% 
        gather(COL, score, all_of(COLs_current)) %>%
        ggplot(aes(x = COL, y = score)) +
        geom_boxplot() 

    # Get relevant data of COL_i for scattering: data of 4th quartile
    quartile_values <- quantile(data_temp[[COL_i]])
    threshold <- quartile_values["75%"]           # threshold = 3. quartile value
    data_temp_filtered <- data_temp %>%
        filter(data_temp[[COL_i]] > threshold) %>%  # filter the data of the 4th quartile
        dplyr::select(COLs_current)                 

    # Create layer of scatter for 4th quartile of COL_i
    scatter <- geom_point(data=data_temp_filtered,
                          mapping = aes(x = COL_i_index,
                                        y = eval(parse(text=(paste0(COL_i))))),
                          color= "orange")

    # add geom objects to create final plot for COL_i
    current_plot_complete <-  boxplot_scores + scatter

    return(current_plot_complete)
    }

When you use lapply you don't have such a problem. It is inspired by this post

Raine answered 17/6, 2020 at 7:58 Comment(2)
thanks a lot for your fully functioning solution using lapply(...). Though not expected at first I was able to work around my elaborate version to confirm with your prosed solution – which works well!Xanthin
Arranging the plots stored in the list l in a joint plot requires subsetting l, i.e. ggpubr::ggarrange(l[[1]], l[[2]]). Else running ggpubr::ggarrange(l) would throw the following error In as_grob.default(plot) : Cannot convert object of class list into a grob.Xanthin
L
1

I think the problem is that ggplot uses lazy evaluation. When the list is rendered, the loop index has its final value, and that is the one used to render all the plots in the list.

This post is relevant.

Loring answered 17/6, 2020 at 7:41 Comment(2)
As an aside, the complexity of your plotting process suggests that your data might not be tidy in the context of what you are trying to achieve.Loring
Indeed, the "issue" is a result of ggplot's default behavior, as also noted above. I tried to convince ggplot to store or at least refer to the correct data by adding the data argument to geom_point(data=data_temp_filtered,... as proposed in the post you refer to, but it did not help.Xanthin
R
1

I propose this solution which doesn't tell you why it doesn't work like you do :

l <- lapply(choice_COLs, temporary_function)

temporary_function <- function(COL_i){
    COL_i_index <- which(COL_i == COLs_current)

    # Generate "basis boxplot" (to plot scatterplot on top)
    boxplot_scores <- data_temp %>% 
        gather(COL, score, all_of(COLs_current)) %>%
        ggplot(aes(x = COL, y = score)) +
        geom_boxplot() 

    # Get relevant data of COL_i for scattering: data of 4th quartile
    quartile_values <- quantile(data_temp[[COL_i]])
    threshold <- quartile_values["75%"]           # threshold = 3. quartile value
    data_temp_filtered <- data_temp %>%
        filter(data_temp[[COL_i]] > threshold) %>%  # filter the data of the 4th quartile
        dplyr::select(COLs_current)                 

    # Create layer of scatter for 4th quartile of COL_i
    scatter <- geom_point(data=data_temp_filtered,
                          mapping = aes(x = COL_i_index,
                                        y = eval(parse(text=(paste0(COL_i))))),
                          color= "orange")

    # add geom objects to create final plot for COL_i
    current_plot_complete <-  boxplot_scores + scatter

    return(current_plot_complete)
    }

When you use lapply you don't have such a problem. It is inspired by this post

Raine answered 17/6, 2020 at 7:58 Comment(2)
thanks a lot for your fully functioning solution using lapply(...). Though not expected at first I was able to work around my elaborate version to confirm with your prosed solution – which works well!Xanthin
Arranging the plots stored in the list l in a joint plot requires subsetting l, i.e. ggpubr::ggarrange(l[[1]], l[[2]]). Else running ggpubr::ggarrange(l) would throw the following error In as_grob.default(plot) : Cannot convert object of class list into a grob.Xanthin

© 2022 - 2024 — McMap. All rights reserved.