I'm trying to build a custom stat function with ggplot2 wherein I would like to access a discrete variable to compute a statistic with per group. However, the default behaviour of ggplot layers is to automatically assign implicit groups to any discrete variables (mostly). This means that my data gets split up over an automatic grouping, which I wouldn't want.
I can show this as follows; I have a pretty standard constructor:
library(ggplot2)
stat_example <- function(
mapping = NULL,
data = NULL,
geom = "point",
position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
) {
layer(data = data,
mapping = mapping,
stat = StatExample,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm))
}
And I have a Stat ggproto object that simply passes along the data, but prints the head of the data for illustration purposes. I've called the bit that I'm interested in for computing an actual stat value
here.
StatExample <- ggproto(
"StatExample",
Stat,
required_aes = c("x", "y", "value"),
default_aes = aes(x = after_stat(x), y = after_stat(y)),
compute_group = function(data, scales) {
print(head(data, 2))
data
}
)
Now if I construct a plot with this stat, we can see what goes into the compute_group()
function as data
.
g <- ggplot(iris) +
stat_example(aes(Sepal.Width, Sepal.Length, value = Species))
# To get only the print side-effect, not the plot (which looks normal)
g <- ggplotGrob(g)
#> x y value PANEL group
#> 1 3.5 5.1 setosa 1 1
#> 2 3.0 4.9 setosa 1 1
#> x y value PANEL group
#> 51 3.2 7.0 versicolor 1 2
#> 52 3.2 6.4 versicolor 1 2
#> x y value PANEL group
#> 101 3.3 6.3 virginica 1 3
#> 102 2.7 5.8 virginica 1 3
Created on 2020-05-28 by the reprex package (v0.3.0)
I would like to have 1 data.frame containing all the data for this case. We see above that we've printed 3 data.frames with different group
variables, meaning that the data has been split into 3 groups. What I think it would take to get there, is to have the value
variable escape the automatic group detection.
I've considered the following points:
- I could let the group default to
-1
, which is the standard 'no group'-group. However, when I do this, the data will not get automatically grouped when for exampleaes(colour = some_variable)
. This I definitively want to happen. - Looking at
ggplot2:::add_group()
function, it seems I can escape the autogrouping by naming myvalue
variablelabel
, however this would make the stat incompatible withgeom_text()
and it doesn't describe the meaning ofvalue
naturally. - I could replace the
layer()
call with a variant of this function, that would make a different Layer ggproto object whereincompute_aesthetics()
works out groups differently. This however is a lot of work I would rather prevent to be burdened with. - I could probably pull a trick along the lines of
vctrs::new_vctr(..., class = "not_discrete")
, but where is the appropriate place to wrap myvalue
variable in that class?
Helpful suggestions are welcome, or new takes on 'just use label
' arguments too.
param
, rather than asaes
? – Tenno