How to extend ggplot2 boxplot with ggproto?
Asked Answered
L

1

7

I'm often using boxplots in my work and like ggplot2 aesthetics. But standard geom_boxplot lacks two things important for me: ends of whiskers and median labels. Thanks to information from here I've written a function:

gBoxplot <- function(formula = NULL, data = NULL, font = "CMU Serif", fsize = 18){
  require(ggplot2)
  vars <- all.vars(formula)
  response <- vars[1]
  factor <- vars[2]
  # A function for medians labelling
  fun_med <- function(x){
    return(data.frame(y = median(x), label = round(median(x), 3)))
  }
  p <- ggplot(data, aes_string(x = factor, y = response)) +
  stat_boxplot(geom = "errorbar", width = 0.6) +
  geom_boxplot() +
  stat_summary(fun.data = fun_med, geom = "label", family = font, size = fsize/3, 
                                                                         vjust = -0.1) +
  theme_grey(base_size = fsize, base_family = font)
  return(p)
}

There are also font settings, but this is just because I'm too lazy to make a theme. Here is an example:

gBoxplot(hwy ~ class, mpg)

plot1

Good enough for me, but there are some restrictictions (cannot use auto-dodging, etc.), and it will be better to make a new geom based on geom_boxplot. I've read the vignette Extending ggplot2, but cannot understand how to implement it. Any help will be appreciated.

Loreeloreen answered 19/1, 2016 at 12:20 Comment(4)
So what do you think?Epos
Great thanks, @MikeWise! See comment for your answer.Loreeloreen
Err, which comment? And if you like it, could you accept the answer? To do that, click on the grey check-mark under then vote-number near the top-left of my answer.Epos
OK, I've accepted!Loreeloreen
E
11

So been thinking about this one for a while. Basically when you create a new primitive, you normally write a combination of:

  1. A layer-function
  2. A stat-ggproto,
  3. A geom-ggproto

Only the layer-function need be visible to the user. You only need to write a stat-ggproto if you need some new way of transforming your data to make your primitive. And you only need write a geom-ggproto if you have some new grid-based graphics to create.

In this case, where we are basically composting layer-function that already exist, we don’t really need to write new ggprotos. It is enough to write a new layer-function. This layer-function will create the three layers that you already are using and map the parameters the way you intend. In this case:

  • Layer1 – uses geom_errorbar and stat_boxplot – to get our errorbars
  • Layer2 – uses geom_boxplot and stat_boxplot - to create the boxplots
  • Layer3 – users geom_label and stat_summary - to create the text labels with the mean value in the center of the boxes.

Of course you could write a new stat-ggproto and a new geom-ggproto that do all of these things at once. Or maybe you compost stat_summary and stat_boxplot into one, and the three geom-protos as well, and this do this with one layer. But there is little point unless we have efficiency problems.

Anyway, here is the code:

geom_myboxplot <- function(formula = NULL, data = NULL,
                           stat = "boxplot", position = "dodge",coef=1.5,
                           font = "sans", fsize = 18, width=0.6,
                           fun.data = NULL, fun.y = NULL, fun.ymax = NULL,
                           fun.ymin = NULL, fun.args = list(),
                           outlier.colour = NULL, outlier.color = NULL,
                           outlier.shape = 19, outlier.size = 1.5,outlier.stroke = 0.5,
                           notch = FALSE,  notchwidth = 0.5,varwidth = FALSE,
                           na.rm = FALSE, show.legend = NA,
                           inherit.aes = TRUE,...) {
    vars <- all.vars(formula)
    response <- vars[1]
    factor <- vars[2]
    mymap <- aes_string(x=factor,y=response)
    fun_med <- function(x) {
        return(data.frame(y = median(x), label = round(median(x), 3)))
    }
    position <- position_dodge(width)
    l1 <- layer(data = data, mapping = mymap, stat = StatBoxplot,
            geom = "errorbar", position = position, show.legend = show.legend,
            inherit.aes = inherit.aes, params = list(na.rm = na.rm,
                coef = coef, width = width, ...))
    l2 <- layer(data = data, mapping = mymap, stat = stat, geom = GeomBoxplot,
            position = position, show.legend = show.legend, inherit.aes = inherit.aes,
            params = list(outlier.colour = outlier.colour, outlier.shape = outlier.shape,
                outlier.size = outlier.size, outlier.stroke = outlier.stroke,
                notch = notch, notchwidth = notchwidth, varwidth = varwidth,
                na.rm = na.rm, ...))
    l3 <- layer(data = data, mapping = mymap, stat = StatSummary,
            geom = "label", position = position, show.legend = show.legend,
            inherit.aes = inherit.aes, params = list(fun.data = fun_med,
                fun.y = fun.y, fun.ymax = fun.ymax, fun.ymin = fun.ymin,
                fun.args = fun.args, na.rm=na.rm,family=font,size=fsize/3,vjust=-0.1,...))
    return(list(l1,l2,l3))
}

which allows you to create your customized boxplots it now like this:

ggplot(mpg) +
  geom_myboxplot( hwy ~ class, font = "sans",fsize = 18)+
  theme_grey(base_family = "sans",base_size = 18 )

And they look like this:

enter image description here

Note: we did not actually have to use the layer function, we could have used the orginal stat_boxplot, geom_boxplot, and stat_summary calls in their place. But we still would have had to fill in all the parameters if we wanted to be able to control them from our custom boxplot, so I think it was clearer this way - at least from the point-of-view of structure as opposed to functionality. Maybe it isn't though, it is a matter of taste...

Also I don't have that font which does look a lot nicer. But I did not feel like tracking it down and installing it.

Epos answered 18/2, 2016 at 15:7 Comment(5)
I've lost hope for any answer, so view yours just now. Really it's the very thing: working example and some theory to think about. Need some time to load the idea in mind (English is not my motherlang). The font I use is Computer Modern Unicode, a unicode version of D. Knuth's classic.Loreeloreen
Great. Glad you like it. Please accept the answer though.Epos
Thanks. If you had two more points you could upvote it too. Maybe next time :)Epos
@MikeWise Do you know how I could adapt this approach to adjust positions for e.g. points or polygons? I'm looking to develop a function where overlapping geometric objects are shifted so they don't overlap, probably using a force-direction approach. Any pointers much appreciatedShoreless
The example above shifts the points to avoid collisions. Are you talking about new gemos that you define, or are you looking to layer other already existing geoms.Epos

© 2022 - 2024 — McMap. All rights reserved.