Percentage histogram with facet_wrap
Asked Answered
P

3

15

I am trying to combine percentage histogram with facet_wrap, but the percentages are not calculated based on group but all data. I would like each histogram to show distribution in a group, not relative to all population. I know it is possible to do several plots and combine them with multiplot.

library(ggplot2)
library(scales)
library(dplyr)

set.seed(1)
df <- data.frame(age = runif(900, min = 10, max = 100),
                 group = rep(c("a", "b", "c", "d", "e", "f", "g", "h", "i"), 100))

tmp <- df %>%
  mutate(group = "ALL")

df <- rbind(df, tmp)

ggplot(df, aes(age)) + 
  geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) + 
  scale_y_continuous(labels = percent ) + 
  facet_wrap(~ group, ncol = 5) 

Output: output plot

Parisian answered 7/10, 2018 at 16:0 Comment(0)
R
15

Try with y = stat(density) (or y = ..density.. prior to ggplot2 version 3.0.0) instead of y = (..count..)/sum(..count..)

ggplot(df, aes(age, group = group)) + 
  geom_histogram(aes(y = stat(density) * 5), binwidth = 5) + 
  scale_y_continuous(labels = percent ) +
  facet_wrap(~ group, ncol = 5)

enter image description here

from ?geom_histogram under "Computed variables"

density : density of points in bin, scaled to integrate to 1

We multiply by 5 (the bin width) because the y-axis is a density (the area integrates to 1), not a percentage (the heights sum to 1), see Hadley's comment (thanks to @MariuszSiatka).

Redfish answered 7/10, 2018 at 18:45 Comment(4)
Adding clauswilke's solution whereby we preserve the % on y axis (not density) geom_histogram(aes(y = stat(width*density)))Heelpost
@SweepyDodo you should add that as an answer.Infallibilism
1: stat(width * density) was deprecated in ggplot2 3.4.0. ℹ Please use after_stat(width * density) instead.Leora
doing stat(width*density) or after_stat(width * density) was the solution, and is much more intelligible code-wise than multiplying by hard-coded constants. @SweepyDodo seconding that you make your comment a standalone solution.Miyokomizar
S
3

While it seems facet_wrap does not run the special geom_histogram percentage calculation within each subset, consider building a list of plots separately and then grid arrange them together.

Specifically, call by to run your ggplots in subsets of group and then call gridExtra::grid.arrange() (actual package method) to somewhat mimic facet_wrap:

library(ggplot2)
library(scales)
library(gridExtra)

...

grp_plots <- by(df, df$group, function(sub){
  ggplot(sub, aes(age)) + 
    geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) + 
    scale_y_continuous(labels = percent ) + ggtitle(sub$group[[1]]) +
    theme(plot.title = element_text(hjust = 0.5))
})

grid.arrange(grobs = grp_plots, ncol=5)

Plot Output


However to avoid the repeated y-axis and x-axis, consider conditionally setting the theme within by call, assuming you know your groups ahead of time and they are a reasonable handful in number.

grp_plots <- by(df, df$group, function(sub){

  # BASE GRAPH
  p <- ggplot(sub, aes(age)) + 
    geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) + 
    scale_y_continuous(labels = percent ) + ggtitle(sub$group[[1]])

  # CONDITIONAL theme() CALLS
  if (sub$group[[1]] %in% c("a")) {
    p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.x = element_blank(), 
                  axis.text.x = element_blank(), axis.ticks.x = element_blank())
  }
  else if (sub$group[[1]] %in% c("f")) {
    p <- p + theme(plot.title = element_text(hjust = 0.5))
  }
  else if (sub$group[[1]] %in% c("b", "c", "d", "e")) {
    p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.y = element_blank(), 
                   axis.text.y = element_blank(), axis.ticks.y = element_blank(),
                   axis.title.x = element_blank(), axis.text.x = element_blank(), 
                   axis.ticks.x = element_blank())
  }
  else {
    p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.y = element_blank(), 
                   axis.text.y = element_blank(), axis.ticks.y = element_blank())
  }
  return(p)
})

grid.arrange(grobs=grp_plots, ncol=5)

Plot Output

Schwartz answered 7/10, 2018 at 23:34 Comment(0)
H
0

After adding a comment to @markus 's answer I saw couple of comments asking for it to be a stand-alone answer.

ggplot(df, aes(age)) + 
  geom_histogram(aes(y = stat(width*density)), binwidth = 10) + 
  scale_y_continuous(labels = percent ) +
  facet_wrap(~ group, ncol = 5)

Compared to my initial comment, I have added bindwidth for flexibility.

Credit: clauswilke here

Heelpost answered 8/8 at 12:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.