ggplot geom_bar plot percentages by group and facet_wrap
Asked Answered
H

1

3

I want to plot multiple categories on a single graph, with the percentages of each category adding up to 100%. For example, if I were plotting male versus female, each grouping (male or female), would add up to 100%. I'm using the following code, where the percentages appear to be for all groups on both graphs, i.e. if you added up all the bars on the left and right hand graphs, they would total 100%, rather than the yellow bars on the left hand graph totalling 100%, the purple bars on the left hand graph totalling 100% etc.

I appreciate that this is doable by using stat = 'identity', but is there a way to do this in ggplot without wrangling the dataframe prior to plotting?

library(ggplot2)  

tmp <- diamonds %>% filter(color %in% c("E","I")) %>% select(color, cut, clarity)

ggplot(data=tmp,
     aes(x=clarity,
         fill=cut)) + 
  geom_bar(aes(y = (..count..)/sum(..count..)), position="dodge") +
  scale_y_continuous(labels = scales::percent) + facet_wrap(vars(color))

enter image description here

Haro answered 2/7, 2021 at 15:10 Comment(0)
C
5

When computing the percentages inside ggplot2 you have to do the grouping of the data as you would when summarizing the data before passing it to ggplot. In your case the PANEL column added internally to the data by ggplot2 could be used for the grouping:

Using after_stat() and ave() to compute the sum of the counts by group this could be achieved like so:

library(ggplot2)  
library(dplyr)

tmp <- diamonds %>% 
    filter(color %in% c("E","I")) %>% 
    select(color, cut, clarity)

ggplot(
  data = tmp,
  aes(
    x = clarity,
    fill = cut
  )
) +
  geom_bar(
    aes(y = after_stat(count / ave(count, PANEL, FUN = sum))),
    position = "dodge"
  ) +
  scale_y_continuous(labels = scales::percent) +
  facet_wrap(vars(color))

EDIT If you need to group by more than one variable I would suggest to make use of a helper function, where I make use of dplyr for the computations:

comp_pct <- function(count, PANEL, cut) {
  data.frame(count, PANEL, cut) %>% 
    group_by(PANEL, cut) %>% 
    mutate(pct = count / sum(count)) %>% 
    pull(pct)
}

ggplot(data=tmp,
       aes(x=clarity,
           fill=cut)) + 
  geom_bar(aes(y = after_stat(comp_pct(count, PANEL, fill))), position="dodge") +
  scale_y_continuous(labels = scales::percent) + facet_wrap(vars(color))

Caduceus answered 2/7, 2021 at 15:30 Comment(4)
thanks Stefan, it's getting close. Adding the yellow bars in the left hand graph together still gives you a bigger number than the purple bars in the left hand panel. I'd like them both to equal 100%. Other than PANEL, could you also group by cut?Haro
Hi pauke. Sure could we group by more than one variable. See my edit.Caduceus
Brilliant, that's a pretty neat way of doing thisHaro
@Caduceus doing god's work bro. Big high five.Beelzebub

© 2022 - 2024 — McMap. All rights reserved.