How to test that 2 ggplots are functionally the same?
Asked Answered
I

1

8

2 ggplot objects that produce the same visual output are not necessarily identical, yet it's sometimes useful to assess their equivalence. How can we do this ?

It looks like the feature won't be supported by ggplot2 so it'd be nice to have a community solution, and maybe a packaged function somewhere.

Please find my own take below. Feel free to improve on it or contribute a different approach.

expected :

library(ggplot2)
p1 <- ggplot(cars, aes(dist, speed)) + ggplot2::geom_point() 
p2 <- ggplot(cars, aes(dist, speed)) + geom_point() + scale_x_continuous(breaks = seq(0, 125, 25))
equivalent_ggplot(p1, p2)
#> [1] TRUE

equivalent_ggplot(p1, p2 + geom_blank())
#> [1] TRUE
Incidence answered 9/2 at 9:54 Comment(8)
just for the sake of learning, may I ask the motivation?Oldenburg
My use case is for the {constructive} package: github.com/cynkra/constructive . It generates code to construct any R object, including ggplots. We can mostly generate code that reproduces objects exactly but ggplot objects are tricky because they contain NSE artifacts, so ggplot2::geom_point() and geom_point() don't create the same object for instance. In order to test that our generated code constructs functionally identical plots we need this feature.Incidence
I'm aware that {patchwork} somehow assesses equivalence of plot legends, e.g. in wrap_plots(), to test whether they should be merged. Possibly a peek under the hood would be informative. Surprised Thomas didn't mention this when replying to your GitHub issue.Panther
I don't want to say that this is a fool's errand, but there are too many ways in which two plots can be visually identical but with radically different structures. For example, you could have a PNG image of a ggplot drawn as a geom_raster that matches the original exactly; invisible layers of points; gridlines drawn as segments; facets made to look like they aren't facets, etc. etc. I like the goal of reproducing ggplot code from a given plot and that's definitely do-able, but I'm not sure how well defined your current quest is.Monty
I almost feel like answering with counter-examples which wouldn't work with the method of comparing unnamed / unlisted grobs, but that isn't an answer. Great question, by the way!Monty
Indeed the question is ambiguous to some extent, maybe we can identify and codify these ambiguities ? The ultimate function could have arguments to suit different needsIncidence
But if I could, say, make two plots look identical, with one using facets and the other not using facets, surely that would demonstrate that a grob-level approach is doomed to failure? Is the meta goal here not just to reverse engineer the code to allow a ggplot to be recreated? Presumably you then you want this equality function for testing whether your reverse-engineering is successful rather than being a general-purpose equality checker?Monty
Yes for my use case examples such as the one you gave are unlikely to happen, I will not feed the function 2 plots with completely different structure and if I do it will probably be because the construction is wrong.Incidence
I
9

I'm not sure how robust it is but I've been using this:

equivalent_ggplot <- function(x, y) {
  # ggplot_table triggers a blank plot that can't be silenced so we divert it
  # not sure if pdf() is the most efficient
  pdf(tempfile(fileext = ".pdf"))
  x_tbl <- suppressWarnings(ggplot2::ggplot_gtable(ggplot2::ggplot_build(x)))
  y_tbl <- suppressWarnings(ggplot2::ggplot_gtable(ggplot2::ggplot_build(y)))
  dev.off()
  # we could probably do a better index equivalency check than just scrubbing
  # them off, but I haven't figured out how it works
  x_unlisted <- gsub("\\d+", "XXX", unlist(x_tbl))
  y_unlisted <- gsub("\\d+", "XXX", unlist(y_tbl))
  names(x_unlisted) <- gsub("\\d+", "XXX", names(x_tbl))
  names(y_unlisted) <- gsub("\\d+", "XXX", names(y_tbl))
  identical(x_unlisted, y_unlisted)
}

library(ggplot2)
p1 <- ggplot(cars, aes(dist, speed)) + ggplot2::geom_point() 
p2 <- ggplot(cars, aes(dist, speed)) + geom_point() + scale_x_continuous(breaks = seq(0, 125, 25))
equivalent_ggplot(p1, p2)
#> [1] TRUE

Created on 2024-02-09 with reprex v2.0.2

Comparing the svgs is another option, that's twice slower than the above (not that bad, I was expecting slower) but I suspect with complex plots the relative time difference will be bigger.

equivalent_ggplot2 <- function(x, y) {
  tmp1 <- tempfile(fileext = ".svg")
  tmp2 <- tempfile(fileext = ".svg")
  suppressMessages(ggplot2::ggsave(tmp1, x))
  suppressMessages(ggplot2::ggsave(tmp2, y))
  tools::md5sum(tmp1) == tools::md5sum(tmp2)
}
Incidence answered 9/2 at 9:56 Comment(2)
Yet equivalent_ggplot(p1, p2 + geom_blank()) is FALSE with two pixel-for-pixel identical plots.Monty
Thanks, I'm adding this to the questionIncidence

© 2022 - 2024 — McMap. All rights reserved.