Escape 'discrete aesthetic implies group' in custom stat
Asked Answered
C

1

6

I'm trying to build a custom stat function with ggplot2 wherein I would like to access a discrete variable to compute a statistic with per group. However, the default behaviour of ggplot layers is to automatically assign implicit groups to any discrete variables (mostly). This means that my data gets split up over an automatic grouping, which I wouldn't want.

I can show this as follows; I have a pretty standard constructor:

library(ggplot2)

stat_example <- function(
  mapping = NULL,
  data = NULL,
  geom = "point",
  position = "identity",
  ...,
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE
) {
  layer(data = data,
        mapping = mapping,
        stat = StatExample,
        geom = geom,
        position = position,
        show.legend = show.legend,
        inherit.aes = inherit.aes,
        params = list(na.rm = na.rm))
}

And I have a Stat ggproto object that simply passes along the data, but prints the head of the data for illustration purposes. I've called the bit that I'm interested in for computing an actual stat value here.

StatExample <- ggproto(
  "StatExample",
  Stat,
  required_aes = c("x", "y", "value"),
  default_aes = aes(x = after_stat(x), y = after_stat(y)),
  compute_group = function(data, scales) {
    print(head(data, 2))
    data
  }
)

Now if I construct a plot with this stat, we can see what goes into the compute_group() function as data.

g <- ggplot(iris) +
  stat_example(aes(Sepal.Width, Sepal.Length, value = Species))

# To get only the print side-effect, not the plot (which looks normal)
g <- ggplotGrob(g)
#>     x   y  value PANEL group
#> 1 3.5 5.1 setosa     1     1
#> 2 3.0 4.9 setosa     1     1
#>      x   y      value PANEL group
#> 51 3.2 7.0 versicolor     1     2
#> 52 3.2 6.4 versicolor     1     2
#>       x   y     value PANEL group
#> 101 3.3 6.3 virginica     1     3
#> 102 2.7 5.8 virginica     1     3

Created on 2020-05-28 by the reprex package (v0.3.0)

I would like to have 1 data.frame containing all the data for this case. We see above that we've printed 3 data.frames with different group variables, meaning that the data has been split into 3 groups. What I think it would take to get there, is to have the value variable escape the automatic group detection.

I've considered the following points:

  • I could let the group default to -1, which is the standard 'no group'-group. However, when I do this, the data will not get automatically grouped when for example aes(colour = some_variable). This I definitively want to happen.
  • Looking at ggplot2:::add_group() function, it seems I can escape the autogrouping by naming my value variable label, however this would make the stat incompatible with geom_text() and it doesn't describe the meaning of value naturally.
  • I could replace the layer() call with a variant of this function, that would make a different Layer ggproto object wherein compute_aesthetics() works out groups differently. This however is a lot of work I would rather prevent to be burdened with.
  • I could probably pull a trick along the lines of vctrs::new_vctr(..., class = "not_discrete"), but where is the appropriate place to wrap my value variable in that class?

Helpful suggestions are welcome, or new takes on 'just use label' arguments too.

Ceramic answered 28/5, 2020 at 22:28 Comment(2)
I am actually quite curious about this. (+1) why you would need that? Isn't it the whole point about aes that it creates groups? If this is a value that should be passed to the entire data, why not passing it as a param, rather than as aes?Tenno
Well the short answer is: for this stat: github.com/teunbrand/ggh4x/blob/master/R/stat_rle.R. I don't want to pass it as a param as I want the nonstandard evaluation to happen and I want it to be guaranteed parallel to the data. (but not trigger groups).Ceramic
U
2

If this is an occasional use case, a simple (albeit manual) hack could be running trace(ggplot2:::add_group, edit = TRUE) and add "value" alongside "label", "PANEL" as variable names to be excluded from automatic group detection.

A less manual (but probably more fragile) way to achieve the same effect would involve the following steps:

  1. Define a modified version of the add_group function with the above modification;
  2. Define a modified version of the Layer ggproto object that uses the modified add_group in its compute_aesthetics function;
  3. Point the custom stat to the modified layer.
# define modified add_group function
add_group2 <- function (data) {
  if (ggplot2:::empty(data)) 
    return(data)
  if (is.null(data$group)) {
    disc <- vapply(data, ggplot2:::is.discrete, logical(1))
    disc[names(disc) %in% c("label", "PANEL", "value")] <- FALSE         # change here
    if (any(disc)) {
      data$group <- vctrs::vec_group_id(data[disc])
    }
    else {
      data$group <- ggplot2:::NO_GROUP
    }
  } else {
    data$group <- vctrs::vec_group_id(data["group"])
  }
  data
}

# define modified compute_aesthetics function that uses modified add_group in second last line
compute_aesthetics_alt <- .subset2(ggplot2:::Layer, "compute_aesthetics")
body(compute_aesthetics_alt)[[length(body(compute_aesthetics_alt)) - 1]] <- 
  quote(evaled <- add_group2(evaled))

# define modified Layer ggproto object that uses alternative compute_aesthetics
Layer2 <- ggproto("Layer2",
                  ggplot2:::Layer,
                  compute_aesthetics = compute_aesthetics_alt)

# define modified stat with Layer2 specified as its layer_class
stat_example <- function(
  mapping = NULL,
  data = NULL,
  geom = "point",
  position = "identity",
  ...,
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE
) {
  layer(data = data,
        mapping = mapping,
        stat = StatExample,
        geom = geom,
        position = position,
        show.legend = show.legend,
        inherit.aes = inherit.aes,
        params = list(na.rm = na.rm),
        layer_class = Layer2) # change here
}

Usage:

# add new column to simulate different colour
iris$gg <- sample(c("a", "b"), size = nrow(iris), replace = TRUE) 

ggplot(iris) + 
  stat_example(aes(Sepal.Width, Sepal.Length,
                   value = Species))
# prints one data frame, because there's only one group by default

ggplot(iris) + 
  stat_example(aes(Sepal.Width, Sepal.Length,
                   value = Species, colour = gg))
# prints two data frames, because grouping is based on the colour aesthetic,
# which has two possible values
Unisexual answered 30/6, 2020 at 11:20 Comment(1)
Thanks this works great and a lot less code that I would have expected!Ceramic

© 2022 - 2024 — McMap. All rights reserved.