I have an elaborate plot routine that generates box plots with additional layers of scatter and adds them to a plot list.
The routine generates correct plots if they are created during the for loop directly via print(current_plot_complete)
.
However, if they are added to a plot list during the for loop which is printed only at the end, then the plots are incorrect: the final indices are used to generate all plots (instead of the current index at the time the plot is generated).
This seems to be default ggplot2
behavior and I am looking for a solution to circumvent it in the current use case.
The issue seems to be within y = eval(parse(text=(paste0(COL_i))))
where the global environment is used (and thus the final index value) instead of the current values at the time of loop execution.
I tried various approaches to make eval() use the correct variable values, e.g. local(…)
or specifying the environment – but without success.
A very simplified MWE is provided below.
MWE
The original routine is much more elaborate than this MWE such that the for
loop can not be replaced easily with members of the apply
family.
# create some random data
data_temp <- data.frame(
"a" = sample(x = 1:100, size = 50),
"b" = rnorm(n = 50, mean = 45, sd = 1),
"c" = sample(x = 20:70, size = 50),
"d" = rnorm(n = 50, mean = 40, sd = 15),
"e" = rnorm(n = 50, mean = 50, sd = 10),
"f" = rnorm(n = 50, mean = 45, sd = 1),
"g" = sample(x = 20:70, size = 50)
)
COLs_current <- c("a", "b", "c", "d", "e") # define COLs of data to include in box plots
choice_COLs <- c("a", "d") # define COLs of data to add scatter to
plot_list <- list(NA)
plot_index <- 1
for (COL_i in choice_COLs) {
COL_i_index <- which(COL_i == COLs_current)
# Generate "basis boxplot" (to plot scatterplot on top)
boxplot_scores <- data_temp %>%
gather(COL, score, all_of(COLs_current)) %>%
ggplot(aes(x = COL, y = score)) +
geom_boxplot()
# Get relevant data of COL_i for scattering: data of 4th quartile
quartile_values <- quantile(data_temp[[COL_i]])
threshold <- quartile_values["75%"] # threshold = 3. quartile value
data_temp_filtered <- data_temp %>%
filter(data_temp[[COL_i]] > threshold) %>% # filter the data of the 4th quartile
dplyr::select(COLs_current)
# Create layer of scatter for 4th quartile of COL_i
scatter_COL_i <- geom_point(data=data_temp_filtered, mapping = aes(x = COL_i_index, y = eval(parse(text=(paste0(COL_i))))), color= "orange")
# add geom objects to create final plot for COL_i
current_plot_complete <- boxplot_scores + scatter_COL_i
print(current_plot_complete)
plot_list[[plot_index]] <- current_plot_complete
plot_index <- plot_index + 1
}
plot_list
lapply(...)
. Though not expected at first I was able to work around my elaborate version to confirm with your prosed solution – which works well! – Xanthin