Storing ggplot objects in a list from within loop in R
Asked Answered
K

5

54

My problem is similar to this one; when I generate plot objects (in this case histograms) in a loop, seems that all of them become overwritten by the most recent plot.

To debug, within the loop, I am printing the index and the generated plot, both of which appear correctly. But when I look at the plots stored in the list, they are all identical except for the label.

(I'm using multiplot to make a composite image, but you get same outcome if you print (myplots[[1]]) through print(myplots[[4]]) one at a time.)

Because I already have an attached dataframe (unlike the poster of the similar problem), I am not sure how to solve the problem.

(btw, column classes are factor in the original dataset I am approximating here, but same problem occurs if they are integer)

Here is a reproducible example:

library(ggplot2)
source("http://peterhaschke.com/Code/multiplot.R") #load multiplot function

#make sample data
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4, 
          2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3, 
          3, 1, 5, 3, 4, 6)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4, 
          1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3, 
          3, 1, 4, 3, 5, 4)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3, 
          2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3, 
          3, 3, 4, 3, 5, 4)
col4 <- c(2, 5, 2, 1, 4, 1, 3, 4, 1, 3, 5, 2, 4, 3, 5, 3, 4, 6, 3, 4, 6, 4, 3, 2, 5, 5, 4,
          2, 3, 2, 2, 3, 3, 4, 0, 1, 4, 3, 3, 5, 4, 4, 4, 3, 3, 5, 4, 3, 5, 3, 6, 6, 4, 2, 
          3, 3, 4, 4, 4, 6)
data2 <- data.frame(col1,col2,col3,col4)
data2[,1:4] <- lapply(data2[,1:4], as.factor)
colnames(data2)<- c("A","B","C", "D")

#generate plots
myplots <- list()  # new empty list
for (i in 1:4) {
  p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+ 
    geom_histogram(fill="lightgreen") +
    xlab(colnames(data2)[ i])
  print(i)
  print(p1)
  myplots[[i]] <- p1  # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)

When I look at a summary of a plot object in the plot list, this is what I see

> summary(myplots[[1]])
data: A, B, C, D [60x4]
mapping:  x = data2[, i]
faceting: facet_null() 
-----------------------------------
geom_histogram: fill = lightgreen 
stat_bin:  
position_stack: (width = NULL, height = NULL)

I think that mapping: x = data2[, i] is the problem, but I am stumped! I can't post images, so you'll need to run my example and look at the graphs if my explanation of the problem is confusing.

Thanks!

Kekkonen answered 13/8, 2015 at 16:29 Comment(2)
link to multiplot is deadSubcontract
The link works for me. I added a post with the graphs.Mitsue
W
96

In addition to the other excellent answer, here’s a solution that uses “normal”-looking evaluation rather than eval. Since for loops have no separate variable scope (i.e. they are performed in the current environment) we need to use local to wrap the for block; in addition, we need to make i a local variable — which we can do by re-assigning it to its own name1:

myplots <- vector('list', ncol(data2))

for (i in seq_along(data2)) {
    message(i)
    myplots[[i]] <- local({
        i <- i
        ggplot(data2, aes(x = data2[[i]])) +
            geom_histogram(fill = "lightgreen") +
            xlab(colnames(data2)[i])
    })
}

However, an altogether cleaner way is to forego the for loop entirely and use list functions to build the result. This works in several possible ways. The following is the easiest in my opinion:

plot_data_column = function (data, column) {
    ggplot(data, aes_string(x = column)) +
        geom_histogram(fill = "lightgreen") +
        xlab(column)
}

myplots <- lapply(colnames(data2), plot_data_column, data = data2)

This has several advantages: it’s simpler, and it won’t clutter the environment (with the loop variable i).


1 This might seem confusing: why does i <- i have any effect at all? — Because by performing the assignment we create a new, local variable with the same name as the variable in the outer scope. We could equally have used a different name, e.g. local_i <- i.

Walking answered 13/8, 2015 at 17:12 Comment(9)
Thank you so much, especially for the lapply version; I wanted to functionalize this but couldn't figure it out, and decided to do (superficially easier, actually horrible) for loop. I figured it was a variable scope problem, I am often fighting them in R!Kekkonen
Both these solutions are unwieldy. For some reason, myplots burgeons to GB's per iteration in my environment. Using both the local method or function/lapply method.Kutchins
@Kutchins Well that’s an issue with having many very big plots, not with either of these solutions. A common solution is to subsample the number of data points you plot (often, such big plots won’t reliably display all individual data points anyway), or to compute summary statistics ahead of plotting (and plot these rather than the raw data). But sometimes neither works. In that case, the only solution is to avoid having multiple plots in memory at once.Walking
Gotcha, thanks for the response.. It's weird, in my environment pane I see the list takes up 118 GB but in my Task Manager, my rstudio session is barely 5Gb.Kutchins
@Kutchins The estimate in the environment pane is notoriously unreliable. A large part of the reason is that it estimates each object’s size individually but lots of objects in R (particularly data frames) share memory: if you create one data frame from another by modifying one column, then they will share the memory for all remaining columns.Walking
why do you have 'data' and 'data2' in the function?Subcontract
@KonradRudolph I tried your recommendation with local(...) but could not get it to work. Would you have any suggestions for my use case here on SO?Saturnalia
I'd change print(p1) to invisible(p1). Cheers.Blasted
@Blasted Actually neither really has any place there. I left the print() in from OP’s code (OP seems to want to display the current plot in each loop iteration!), but I think it’s misplaced here (and so is invisible(), which has no effect here).Walking
H
22

Because of all the quoting of expressions that get passed around, the i that is evaluated at the end of the loop is whatever i happens to be at that time, which is its final value. You can get around this by eval(substitute(ing in the right value during each iteration.

myplots <- list()  # new empty list
for (i in 1:4) {
    p1 <- eval(substitute(
        ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+ 
          geom_histogram(fill="lightgreen") +
          xlab(colnames(data2)[ i])
    ,list(i = i)))
    print(i)
    print(p1)
    myplots[[i]] <- p1  # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)
Historiography answered 13/8, 2015 at 16:48 Comment(6)
The diagnosis is correct but the solution is somewhat convoluted. It’s easier to capture i in a local context. The problem is that for loops in R have no scope so you need to use local instead: for (i in 1:4) local({i = i; … rest of the loop … }). The self-assignment i = i isn’t by accident — this is actually needed. A different variable name can also be used. Regardless, all this would be unnecessary by using “proper” list functions instead of for, which is frankly a bad language construct in R.Walking
@KonradRudolph local is niceHistoriography
Ah, I forgot something: if local is used, the assignment to myplots[[i]] needs to use the <<- operator instead of local assignment.Walking
@KonradRudolph any chance you want to add a solution using one of the apply functions. It seems, in that case a substitution or local would also be required? Also, is there a reason that local is better than the substitute way?Historiography
I prefer local because it looks like it’s performing standard evaluation (although that’s not the case of course). it hides the evals and substitutes away. In fact neither lapply nor for really needs to capture the variable i if column names are used in the aesthetics. I’ll add an answer.Walking
if number of plots is more than 5-6 then you might need to repeat last line multiplot(plotlist = myplots, cols = 4) to show all plotsAthiste
M
4

I have run the code in the question and in the answer, changing geom_histogram to geom_bar to avoid the error: Error: StatBin requires a continuous x variable.

Here is the code with the visualizations:

Question

#generate plots
myplots <- list()  # new empty list
for (i in 1:4) {
  p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+ 
    geom_bar(fill="lightgreen") +
    xlab(colnames(data2)[ i])
  print(i)
  print(p1)
  myplots[[i]] <- p1  # add each plot into plot list
}

multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid

Answer

myplots <- vector('list', ncol(data2))

for (i in seq_along(data2)) {
    message(i)
    myplots[[i]] <- local({
        i <- i
        p1 <- ggplot(data2, aes(x = data2[[i]])) +
            geom_bar(fill = "lightgreen") +
            xlab(colnames(data2)[i])
        print(p1)
    })
}

multiplot(plotlist = myplots, cols = 4)

Same result using lapply:


plot_data_column = function (data, column) {
    ggplot(data, aes_string(x = column)) +
        geom_bar(fill = "lightgreen") +
        xlab(column)
}

myplots <- lapply(colnames(data2), plot_data_column, data = data2)

multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid

Created on 2021-04-09 by the reprex package (v0.3.0)

Mitsue answered 9/4, 2021 at 14:45 Comment(0)
G
1

Using lapply works too as x exists within the anonymous function environment (using mtcars as data):

plot <- lapply(seq_len(ncol(mtcars)), FUN = function(x) {
  ggplot(data = mtcars) + 
    geom_line(aes(x = mpg, y = mtcars[ , x]), size = 1.4, color = "midnightblue", inherit.aes = FALSE) +
    labs(x="Date", y="Value", title = "Revisions 1M", subtitle = colnames(mtcars)[x]) +
    theme_wsj() +
    scale_colour_wsj("colors6")
})
Gripping answered 19/8, 2020 at 6:40 Comment(0)
M
0

Here is another solution:

#generate plots
myplots <- list()  # new empty list
for (col in colnames(data2)) {
  p1 <- ggplot(data=data.frame(data2),aes(x=!!ensym(col)))+ 
    geom_bar(fill="lightgreen") +
    xlab(col)
  myplots[[col]] <- p1  # add each plot into plot list
}

multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid
More answered 10/5, 2023 at 9:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.