ggplot replace count with percentage in geom_bar
Asked Answered
G

3

29

I have a dataframe d:

> head(d,20)
   groupchange Symscore3
1            4         1
2            4         2
3            4         1
4            4         2
5            5         0
6            5         0
7            5         0
8            4         0
9            2         2
10           5         0
11           5         0
12           5         1
13           5         0
14           4         1
15           5         1
16           1         0
17           4         0
18           1         1
19           5         0
20           4         0

That I am plotting with:

ggplot(d, aes(groupchange, y=..count../sum(..count..),  fill=Symscore3)) +
  geom_bar(position = "dodge") 

In this way each bar represents its percentage on the whole data.

Instead I would like that each bar represents a relative percentage; i.e. the sum of the bar in obtained with groupchange = k should be 1.

Grover answered 16/7, 2014 at 8:45 Comment(3)
Please consider updating the answer to reflect the more accurate and succinct answer below, using position = "fill" especially for a question asking specifically about the ggplot package Otherwise, people are relying upon manually summarizing when the proportion is computed by the geom_bar function itself when using position = "fill" Please consider updating the selected answer so that there is not a persistence of inefficient approaches across the community. I wanted to bring this to your and the community's attention.Chammy
@Chammy I disagree whether my approach is inefficient. It depends on the circumstances imo. For this simple usecase, you might be right. However, when working with large datasets it is (in my experience) more efficient to summarise first and then plot. Also when the summarisation is bit more complex than a straightforward percentage, it is better to summarise first and then plot.Malachite
The dot-dot notation (..count..) was deprecated in ggplot2 3.4.0. ℹ Please use after_stat(count) instead. // i.e y=after_stat(count)/sum(after_stat(count)Osprey
M
38

First summarise and transform your data:

library(dplyr)
d2 <- d %>% 
  group_by(groupchange, Symscore3) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count))

Then you can plot it:

ggplot(d2, aes(x = factor(groupchange), y = perc*100, fill = factor(Symscore3))) +
  geom_bar(stat="identity", width = 0.7) +
  labs(x = "Groupchange", y = "percent", fill = "Symscore") +
  theme_minimal(base_size = 14)

this gives:

enter image description here


Alternatively, you can use the percent function from the scales package:

brks <- c(0, 0.25, 0.5, 0.75, 1)

ggplot(d2, aes(x = factor(groupchange), y = perc, fill = factor(Symscore3))) +
  geom_bar(stat="identity", width = 0.7) +
  scale_y_continuous(breaks = brks, labels = scales::percent(brks)) +
  labs(x = "Groupchange", y = NULL, fill = "Symscore") +
  theme_minimal(base_size = 14)

which gives:

enter image description here

Malachite answered 16/7, 2014 at 9:45 Comment(4)
Given the much more accurate answer below, using position = "fill" - especially for a question asking specifically about the ggplot package, I believe this answer may be leading to a persistence of inefficient approaches across the community. I wanted to bring this to your and the community's attention.Chammy
@Chammy I did use ggplot2 as desired by OP. That doesn't mean I'm not allowed to use other tools/packages. With regard to inefficency, see my comment under the question.Malachite
Sorry, I was not trying to suggest that you did not use ggplot2. Perhaps, you could edit to at least include the position = "fill" option - since, most people only see the top accepted answer and might miss their very simple solution that is likely to be helpful to many new R users. I just wanted to suggest that as a middle ground. If you do do that, please let me know so I can remove these comments.Chammy
@Chammy I doubt whether most people only look at the accepted answer: I've posted quite some answers that received at least a couple of upvotes (some of them even outperforming the accepted answer). Further more, editing in the position = "fill" option would feel like steeling to me. It is also regarded as unfair behavior by most people on SO.Malachite
G
40

If your goal is visualization in minimal code, use position = "fill" as an argument in geom_bar().

If you want within group percentages, @Jaap's dplyr answer answer is the way to go.

Here is a reproducible example using the above dataset to copy/paste:

library(tidyverse)

d <- data_frame(groupchange = c(4,4,4,4,5,5,5,4,2,5,5,5,5,4,5,1,4,1,5,4),
                Symscore3 = c(1,2,1,2,0,0,0,0,2,0,0,1,0,1,1,0,0,1,1,0))

ggplot(d, aes(x = factor(groupchange), fill = factor(Symscore3))) +
  geom_bar(position="fill")

enter image description here

Gauge answered 3/2, 2018 at 21:34 Comment(2)
For people working with small sized dataset, this option is likely to be superior to the accepted answer in terms of clarity of code / efficiency in approach.Chammy
This is an excellent way to quickly convert between counts and proportions with geom_bar()Mcgowen
M
38

First summarise and transform your data:

library(dplyr)
d2 <- d %>% 
  group_by(groupchange, Symscore3) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count))

Then you can plot it:

ggplot(d2, aes(x = factor(groupchange), y = perc*100, fill = factor(Symscore3))) +
  geom_bar(stat="identity", width = 0.7) +
  labs(x = "Groupchange", y = "percent", fill = "Symscore") +
  theme_minimal(base_size = 14)

this gives:

enter image description here


Alternatively, you can use the percent function from the scales package:

brks <- c(0, 0.25, 0.5, 0.75, 1)

ggplot(d2, aes(x = factor(groupchange), y = perc, fill = factor(Symscore3))) +
  geom_bar(stat="identity", width = 0.7) +
  scale_y_continuous(breaks = brks, labels = scales::percent(brks)) +
  labs(x = "Groupchange", y = NULL, fill = "Symscore") +
  theme_minimal(base_size = 14)

which gives:

enter image description here

Malachite answered 16/7, 2014 at 9:45 Comment(4)
Given the much more accurate answer below, using position = "fill" - especially for a question asking specifically about the ggplot package, I believe this answer may be leading to a persistence of inefficient approaches across the community. I wanted to bring this to your and the community's attention.Chammy
@Chammy I did use ggplot2 as desired by OP. That doesn't mean I'm not allowed to use other tools/packages. With regard to inefficency, see my comment under the question.Malachite
Sorry, I was not trying to suggest that you did not use ggplot2. Perhaps, you could edit to at least include the position = "fill" option - since, most people only see the top accepted answer and might miss their very simple solution that is likely to be helpful to many new R users. I just wanted to suggest that as a middle ground. If you do do that, please let me know so I can remove these comments.Chammy
@Chammy I doubt whether most people only look at the accepted answer: I've posted quite some answers that received at least a couple of upvotes (some of them even outperforming the accepted answer). Further more, editing in the position = "fill" option would feel like steeling to me. It is also regarded as unfair behavior by most people on SO.Malachite
M
10

We can also add labels to the proportions without computing them explicitly in the source data frame.

library(tidyverse)

d <- data_frame(groupchange = c(4,4,4,4,5,5,5,4,2,5,5,5,5,4,5,1,4,1,5,4),
                Symscore3 = c(1,2,1,2,0,0,0,0,2,0,0,1,0,1,1,0,0,1,1,0)) %>%
  mutate_all(as.character)  # treat the numbers as categories

ggplot(d, aes(x=groupchange, fill=Symscore3)) +
  geom_bar(position="fill") +
  geom_text(
    aes(label=signif(..count.. / tapply(..count.., ..x.., sum)[as.character(..x..)], digits=3)),
    stat="count",
    position=position_fill(vjust=0.5)) +
  labs(y="Proportion")

enter image description here

The geom_text label in this solution is adapted from here.

Mcgowen answered 19/10, 2021 at 15:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.