ggplot2: Multiple color scales or shift colors systematically on different layers?
Asked Answered
M

5

34

When I make box plots, I like to also show the raw data in the background, like this:

library(ggplot2)
library(RColorBrewer)

cols = brewer.pal(9, 'Set1')

n=10000
dat = data.frame(value=rnorm(n, 1:4), group=factor(1:4))

ggplot(dat, aes(x=group, y=value, color=group, group=group)) +
  geom_point(position=position_jitter(width=0.3), alpha=0.1) +
  scale_color_manual(values=cols) +
  geom_boxplot(fill=0, outlier.size=0)

enter image description here

However, I don't like it how my box plots completely disappear when the points get too dense. I know I can adjust alpha, which is fine in some cases, but not when my groups have varying densities (For example when the lightest group would completely disappear if I were to decrease alpha enough so that the darkest group doesn't obscure the box plot). What I'm trying to do is systematically shift the colors for the box plots - a bit darker, perhaps - so that they show up even when the background points max out the alpha. For example:

plot(1:9, rep(1, 9), pch=19, cex=2, col=cols)
cols_dk = rgb2hsv(col2rgb(brewer.pal(9, 'Set1'))) - c(0, 0, 0.2)
cols_dk = hsv(cols_dk[1,], cols_dk[2,], cols_dk[3,])
points(1:9, rep(1.2, 9), pch=19, cex=2, col=cols_dk)

enter image description here

So far I haven't found a way to fake in a different scale_color for the geom_boxplot layer (which would seem the simplest route if there's a way to do it). Nor have I been able to find a simple syntax to systematically adjust the colors the same way you can easily offset a continuous aesthetic like aes(x=x+1).

The closest thing I've been able to get is to completely duplicate the levels of the factor...

ggplot(dat, aes(x=group, y=value, color=group, group=group)) +
  geom_point(position=position_jitter(width=0.3), alpha=0.1) +
  scale_color_manual(values=c(cols[1:4], cols_dk[1:4])) +
  geom_boxplot(aes(color=factor(as.numeric(group)+4)), fill=0, outlier.size=0)

enter image description here

but then I have to deal with that ugly legend. Any better ideas?

Melanous answered 10/2, 2012 at 17:47 Comment(8)
How about making the boxes black?Deflower
What @Deflower said was my first thought, but I think that runs afoul of the alpha value infecting the legend and making the colors unreadable (at least until 0.9.0 is released again in a few weeks, I think).Tengdin
Yea black/gray is definitely acceptable (see i.imgur.com/7KKg2.png), but I don't like how it can sort of overpower/distract from the factor-coding that I'm usually trying to highlight. I guess I think it would just be even nicer if I could stick to the same color scheme but offset it a bit.Melanous
I would even be satisfied with my above hack if there is a way to drop the 4 "dummy" levels from the legend. (is that possible?)Melanous
Come to think about it, this sort of color shift would also be useful, more generally, for a variety of other summary-type layers. Here is one random example of a PCA biplot coded by k-means cluster. i.imgur.com/iN6xh.png I wanted to overlay the cluster center points too, but had to resort to using a different plotting symbol or else the points would be lost in the cloud. It would be even more effective if I could have just offset the values a bit like above.Melanous
there was a discussion at some point about a hcl colour scale, where you could map independently the three parameters. I think that may be a good option here.Electrolyte
@Electrolyte Yes, exactly! Here I would keep the hue mapped as an aesthetic to group in both layers, but for the geom_boxplot I would set the lightness (as a constant parameter) to be a little darker.Melanous
Suggested alternative solution: github.com/hadley/ggplot2/issues/723Decrement
T
14

For now, you could define your own version of GeomBoxplot (calling it, say, GeomPlotDark), differing from the original only in that it first 'darkens' the colors before plotting them.

With proto, you can do this by creating a proto object, GeomBoxplotDark, that inherits from GeomBoxplot, and differs only in its draw function. Most of the draw function's definition is taken from the GeomBoxplot sources; I have annotated the lines I changed with comments like this # ** ... **:

require(ggplot2)

GeomBoxplotDark <- proto(ggplot2:::GeomBoxplot,
  draw <- function(., data, ..., outlier.colour = "black", outlier.shape = 16, outlier.size = 2) {
    defaults <- with(data, {                               # ** OPENING "{" ADDED **
    cols_dk <- rgb2hsv(col2rgb(colour)) - c(0, 0, 0.2)     # ** LINE ADDED        **
    cols_dk <- hsv(cols_dk[1,], cols_dk[2,], cols_dk[3,])  # ** LINE ADDED        **
    data.frame(x = x, xmin = xmin, xmax = xmax,
      colour = cols_dk,                                    # ** EDITED, PASSING IN cols_dk **
      size = size,
      linetype = 1, group = 1, alpha = 1,
      fill = alpha(fill, alpha),
      stringsAsFactors = FALSE
    )})                                                    # ** CLOSING "}" ADDED **
    defaults2 <- defaults[c(1,1), ]

    if (!is.null(data$outliers) && length(data$outliers[[1]] >= 1)) {
      outliers_grob <- with(data,
        GeomPoint$draw(data.frame(
          y = outliers[[1]], x = x[rep(1, length(outliers[[1]]))],
          colour=I(outlier.colour), shape = outlier.shape, alpha = 1,
          size = outlier.size, fill = NA), ...
        )
      )
    } else {
      outliers_grob <- NULL
    }

    with(data, ggname(.$my_name(), grobTree(
      outliers_grob,
      GeomPath$draw(data.frame(y=c(upper, ymax), defaults2), ...),
      GeomPath$draw(data.frame(y=c(lower, ymin), defaults2), ...),
      GeomRect$draw(data.frame(ymax = upper, ymin = lower, defaults), ...),
      GeomRect$draw(data.frame(ymax = middle, ymin = middle, defaults), ...)
    )))
  }
)

Then create a geom_boxplot_dark() to be called by the user, and which appropriately wraps the call to GeomBoxplotDark$new():

geom_boxplot_dark <- function (mapping = NULL, data = NULL, stat = "boxplot", position = "dodge", 
    outlier.colour = "black", outlier.shape = 16, outlier.size = 2, 
    ...) 
GeomBoxplotDark$new(mapping = mapping, data = data, stat = stat, 
    position = position, outlier.colour = outlier.colour, outlier.shape = outlier.shape, 
    outlier.size = outlier.size, ...)

Finally, try it out with code almost identical to your original call, just substituting a call to geom_boxplot_dark() for the call to geom_boxplot():

library(ggplot2)
library(RColorBrewer)

cols = brewer.pal(9, 'Set1')

n=10000
dat = data.frame(value=rnorm(n, 1:4), group=factor(1:4))

ggplot(dat, aes(x=group, y=value, color=group, group=group)) +
  geom_point(position=position_jitter(width=0.3), alpha=0.1) +
  scale_color_manual(values=cols) +
  geom_boxplot_dark(fill=0, outlier.size=0)

I think the resulting plot looks pretty nifty. With a bit of tweaking, and viewed directly (not as an uploaded file), it'll look awesome:

enter image description here

Tied answered 10/2, 2012 at 23:19 Comment(6)
I think you could simplify the code by inheriting from GeomBoxplot instead of Geom; in particular you could avoid writing the duplicated .$examples, etc. You just need the .$draw methodElectrolyte
@Electrolyte -- Thanks so much for your suggestion! I've edited my question to incorporate it. I'm just now learning about proto objects, and you came close to doubling my understanding of what they're capable of, so thanks for that as well ;)Ferrule
I think I will accept this one. Even though there is more code, I can just toss it in a file of common functions that I source, and then be able to use it flexibly in different plots, with more/less factor levels, etc.. The perfect solution would be a new scale like baptiste mentioned, but this is the next best thing. Terrific. Thanks all!Melanous
FYI ggplot2 is moving away from proto and so this won't work in a future version (maybe ggplot2 1.0?)Horal
@Horal -- Thanks for that heads-up! Out of curiosity, are you moving away from proto for performance reasons, or because of the type of code if forces you to write, or for some other reason?Ferrule
Mainly because no one understands it and it makes profiling hard.Horal
P
23

Late answer added Nov 2012:

Since some of these terrific answers require older ggplot2 versions and people are still referring to this page, I'll update it with the ridiculously simple solution that I've been using with ggplot2 0.9.0+.

We just add a second geom_boxplot layer that is identical to the first one except we assign a constant color using scales::alpha() so the first boxplot shows through.

library(scales) # for alpha function
ggplot(dat, aes(x=group, y=value, color=group, group=group)) +
  geom_point(position=position_jitter(width=0.3), alpha=0.2) +
  geom_boxplot(size=1.4,fill=0, outlier.size=0)+
  geom_boxplot(size=1.4,fill=0, outlier.size=0, color=alpha("black",0.3))

edit: TobiO points out that fill=0 has stopped working. Instead, fill=NA or alpha=0 can be substituted. This seems to be due to a change in col2rgb() starting in R 3.0.0.

jittered points under darker boxplot

Papule answered 22/11, 2012 at 1:18 Comment(0)
T
14

For now, you could define your own version of GeomBoxplot (calling it, say, GeomPlotDark), differing from the original only in that it first 'darkens' the colors before plotting them.

With proto, you can do this by creating a proto object, GeomBoxplotDark, that inherits from GeomBoxplot, and differs only in its draw function. Most of the draw function's definition is taken from the GeomBoxplot sources; I have annotated the lines I changed with comments like this # ** ... **:

require(ggplot2)

GeomBoxplotDark <- proto(ggplot2:::GeomBoxplot,
  draw <- function(., data, ..., outlier.colour = "black", outlier.shape = 16, outlier.size = 2) {
    defaults <- with(data, {                               # ** OPENING "{" ADDED **
    cols_dk <- rgb2hsv(col2rgb(colour)) - c(0, 0, 0.2)     # ** LINE ADDED        **
    cols_dk <- hsv(cols_dk[1,], cols_dk[2,], cols_dk[3,])  # ** LINE ADDED        **
    data.frame(x = x, xmin = xmin, xmax = xmax,
      colour = cols_dk,                                    # ** EDITED, PASSING IN cols_dk **
      size = size,
      linetype = 1, group = 1, alpha = 1,
      fill = alpha(fill, alpha),
      stringsAsFactors = FALSE
    )})                                                    # ** CLOSING "}" ADDED **
    defaults2 <- defaults[c(1,1), ]

    if (!is.null(data$outliers) && length(data$outliers[[1]] >= 1)) {
      outliers_grob <- with(data,
        GeomPoint$draw(data.frame(
          y = outliers[[1]], x = x[rep(1, length(outliers[[1]]))],
          colour=I(outlier.colour), shape = outlier.shape, alpha = 1,
          size = outlier.size, fill = NA), ...
        )
      )
    } else {
      outliers_grob <- NULL
    }

    with(data, ggname(.$my_name(), grobTree(
      outliers_grob,
      GeomPath$draw(data.frame(y=c(upper, ymax), defaults2), ...),
      GeomPath$draw(data.frame(y=c(lower, ymin), defaults2), ...),
      GeomRect$draw(data.frame(ymax = upper, ymin = lower, defaults), ...),
      GeomRect$draw(data.frame(ymax = middle, ymin = middle, defaults), ...)
    )))
  }
)

Then create a geom_boxplot_dark() to be called by the user, and which appropriately wraps the call to GeomBoxplotDark$new():

geom_boxplot_dark <- function (mapping = NULL, data = NULL, stat = "boxplot", position = "dodge", 
    outlier.colour = "black", outlier.shape = 16, outlier.size = 2, 
    ...) 
GeomBoxplotDark$new(mapping = mapping, data = data, stat = stat, 
    position = position, outlier.colour = outlier.colour, outlier.shape = outlier.shape, 
    outlier.size = outlier.size, ...)

Finally, try it out with code almost identical to your original call, just substituting a call to geom_boxplot_dark() for the call to geom_boxplot():

library(ggplot2)
library(RColorBrewer)

cols = brewer.pal(9, 'Set1')

n=10000
dat = data.frame(value=rnorm(n, 1:4), group=factor(1:4))

ggplot(dat, aes(x=group, y=value, color=group, group=group)) +
  geom_point(position=position_jitter(width=0.3), alpha=0.1) +
  scale_color_manual(values=cols) +
  geom_boxplot_dark(fill=0, outlier.size=0)

I think the resulting plot looks pretty nifty. With a bit of tweaking, and viewed directly (not as an uploaded file), it'll look awesome:

enter image description here

Tied answered 10/2, 2012 at 23:19 Comment(6)
I think you could simplify the code by inheriting from GeomBoxplot instead of Geom; in particular you could avoid writing the duplicated .$examples, etc. You just need the .$draw methodElectrolyte
@Electrolyte -- Thanks so much for your suggestion! I've edited my question to incorporate it. I'm just now learning about proto objects, and you came close to doubling my understanding of what they're capable of, so thanks for that as well ;)Ferrule
I think I will accept this one. Even though there is more code, I can just toss it in a file of common functions that I source, and then be able to use it flexibly in different plots, with more/less factor levels, etc.. The perfect solution would be a new scale like baptiste mentioned, but this is the next best thing. Terrific. Thanks all!Melanous
FYI ggplot2 is moving away from proto and so this won't work in a future version (maybe ggplot2 1.0?)Horal
@Horal -- Thanks for that heads-up! Out of curiosity, are you moving away from proto for performance reasons, or because of the type of code if forces you to write, or for some other reason?Ferrule
Mainly because no one understands it and it makes profiling hard.Horal
E
8

You can hack the legend grob, but it seems difficult to place it.

 g = ggplotGrob(p)
 grid.draw(g)
 legend = editGrob(getGrob(g, gPath("guide-box","guide"), grep=TRUE), vp=viewport())
 new = removeGrob(legend, gPath("-7|-8|-9|-10"), grep=TRUE, glob=T)
 ## grid.set(gPath("guide-box"), legend, grep=TRUE) # fails for some reason
 grid.remove(gPath("guide-box"), grep=TRUE, global=TRUE)
 grid.draw(editGrob(new, vp=viewport(x=unit(1.4,"npc"), y=unit(0.1,"npc"))))

enter image description here

Electrolyte answered 10/2, 2012 at 22:34 Comment(3)
Short and sweet - excellent! This is a great contribution too because it's coming at the problem from the opposite side as Josh.Melanous
Can't you just specify the breaks?Horal
@Horal Ah I didn't know that would work either. I figured it would clip the scale somehow. Nice!Melanous
R
3

The ggplot2 syntax seems to have changed, and since it took me a little to figure it out:

the fill=0 does (for me) have no effect (anymore?)

however, it has to be changed to alpha=0 in order to make the box transparent:

library(scales) # for alpha function
ggplot(dat, aes(x=group, y=value, color=group, group=group)) +
geom_point(position=position_jitter(width=0.3), alpha=0.2) +
geom_boxplot(size=1.4,alpha=0, outlier.size=0)+
geom_boxplot(size=1.4,alpha=0, outlier.size=0, color=alpha("black",0.3))

edit: I just found out, that changing fill=0 to fill=NA also does the trick...

Randa answered 7/2, 2013 at 23:36 Comment(1)
Thanks for pointing this out. ?col2rgb indicates that it is a change as of R 3.0.0 where fill=0 is no longer valid. I've updated my answer with a note.Papule
H
1

This has been implemented in ggplot2 3.3.0 (released 2020-03): The new stage function allows you to control aesthetics after mapping of the data by a stat or a scale:

ggplot(dat, aes(x=group, y=value, color=group, group=group)) +
  geom_point(position=position_jitter(width=0.3), alpha=0.1) +
  scale_color_manual(values=cols) +
  geom_boxplot(aes(color=stage(start=group, after_scale = colorspace::darken(color, 0.1))), fill=NA, outlier.size=0)

Hearing answered 25/6, 2021 at 7:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.