Shading (or alpha) boxplots by number of datapoints with ggplot2 in R
Asked Answered
E

3

6

I have columnar data set that I am plotting a series of box plots with, most similar to the setup in this example: Boxplot of table using ggplot2

require(reshape2)
ggplot(data = melt(dd), aes(x=variable, y=value)) + geom_boxplot(aes(fill=variable))

However, in my case, each of the boxplots represents a different number of data points. For example, Column A might have 8000 data points, Column B might have 6000, Column C might have 2500, and Column D might have 800.

To help communicate this, I thought I could alpha the fill color of the box to reflect the number of datapoints. The darker the box, the more datapoints were used in computing the statistics the boxplot represents.

In the ggplot2 help file for geom_histogram, they use aes(fill=..count..) to shade the bins corresponding to the # of counts in the bin.

m <- ggplot(movies, aes(x=rating))    
m + geom_histogram(aes(fill=..count..))

(Wanted to include a picture of the example histogram here, but can't because I don't have enough reputation points...sorry)

I tried using this with my ggplot geom_boxplot, but it doesn't seem to know the ..count.. part. Here is my line that is generating the boxplot:

ggplot(meltedData, aes(x=variable, y=value)) + geom_boxplot(aes(fill=variable), outlier.size = 1) + ylim(-4,3)

Anyone have any pointers? I know I can add the "alpha" property to geom_boxplot, but how can I apply it to each boxplot individually based on the # of datapoints in the boxplot?

Thanks in advance.

Em answered 16/7, 2013 at 17:40 Comment(2)
could you please provide a reproducible example of the columns you're trying to plot?Groping
I don't know the whole ..count.. system very well, but I think it works with histograms because of the stat="bin" argument. You may have to just add count to the data itself.Desiderata
U
7

stat_boxplot doesn't calculate the count. Just do it outside of ggplot2:

library(plyr)
DF <- ddply(mtcars, .(cyl), transform, myalpha = length(cyl))

library(ggplot2)
ggplot(DF, aes(factor(cyl), mpg)) + 
  geom_boxplot(aes(alpha = myalpha), fill = "blue") 

enter image description here

Uuge answered 16/7, 2013 at 18:8 Comment(0)
D
4

My version of Roland's solution using dplyr package:

library(dplyr)
library(ggplot2)

df <- mtcars %>%
  group_by(cyl) %>%
  mutate(my_alpha = length(cyl))

ggplot(df, aes(factor(cyl), mpg)) +
  geom_boxplot(aes(alpha = my_alpha), fill = 'blue')
Decompress answered 17/2, 2019 at 3:45 Comment(0)
T
1

data.table option:

dd <- data.table(dd)
dd[,Count:=.N,by=variable]
Taxpayer answered 16/7, 2013 at 18:10 Comment(2)
Sure. What do you mean by "at least"?Desiderata
I just don't see the need to list all possibilities to do this everytime split-apply-combine is needed in an answer. We really need a good FAQ giving all possibilities. I chose plyr here because I was already in the hadleyverse.Uuge

© 2022 - 2024 — McMap. All rights reserved.