Connect stack bar charts with multiple groups with lines or segments using ggplot 2
Asked Answered
G

2

4

I am conducting a study of a number of patients with a disease, and using an ordinal scale assessment of functional status at 3 different time points. I want to connect multiple groups in stacked bar charts across these time points.

I looked at these topics and havent gotten it to work using these suggestions:

How to position lines at the edges of stacked bar charts

Is there an efficient way to draw lines between different elements in a stacked bar plot using ggplot2?

Draw lines between different elements in a stacked bar plot

Please see the graphical representation of how I ultimately want this figure to look from R (generated in PRISM) of the frequencies of each of these 6 ordinal values across the three time points (top group has no patients with ordinal score 3,5,6):

Intended FIGURE using PRISM Intended FIGURE using PRISM

Data:

library(tidyverse)

mrs <-tibble(
  Score = c(0,1,2,3,4,5,6),
  pMRS = c(17,  2,   1,  0,  1,  0,   0),
  dMRS = c(2,  3,   2,  6,  4,  2,  2),
  fMRS = c(4,  4,  5,  4,  1,  1,  2)

And this is the code that ive tried so far before I run in to issues using either geom_line or geom_segment (left out thse lines because it just distorts the figure currently)

mrs <- mrs %>% mutate(across(-Score,~paste(round(prop.table(.) * 100, 2)))) %>%
   pivot_longer(cols = c("pMRS", "dMRS", "fMRS"), names_to = "timepoint") %>% 
   mutate(Score=as.character(Score),
          value=as.numeric(value)) %>% 
   mutate(timepoint = factor(timepoint, 
                             levels= c("fMRS", 
                              "dMRS",
                              "pMRS"))) %>% 
   mutate(Score = factor(Score,
                         levels = c("6","5","4","3","2","1","0")))
mrs %>% ggplot(aes(y= timepoint, x= value, fill= Score))+
  geom_bar(color= "black", width = 0.6, stat= "identity") +
  scale_fill_manual(name= NULL,
                    breaks = c("6","5","4","3","2","1","0"), values=  c("#000000","#294e63", "#496a80","#7c98ac", "#b3c4d2","#d9e0e6","#ffffff"))+
  scale_y_discrete(breaks=c("pMRS",
                            "dMRS",
                            "fMRS"),
                   labels=c("Pre-mRS,  (N=21)",
                            "Discharge mRS,  (N=21)",
                            "Followup mRS,  (N=21)"))+
  theme_classic()
Gatefold answered 24/1, 2022 at 16:42 Comment(0)
F
2

I don't think there is an easy way of doing this, you'd have to (semi)-manually add these lines yourself. What I'm proposing below comes from this answer, but applied to your case. In essence, it exploits the fact that geom_area() is also stackable like the bar chart is. The downside is that you'll manually have to punch in coordinates for the positions where bars start and end, and you have to do it for each pair of stacked bars.

library(tidyverse)

# mrs <- tibble(...) %>% mutate(...) # omitted for brevity, same as question

mrs %>% ggplot(aes(x= value, y= timepoint, fill= Score))+
  geom_bar(color= "black", width = 0.6, stat= "identity") +
  geom_area(
    # Last two stacked bars
    data = ~ subset(.x, timepoint %in% c("pMRS", "dMRS")),
    # These exact values depend on the 'width' of the bars
    aes(y = c("pMRS" = 2.7, "dMRS" = 2.3)[as.character(timepoint)]),
    position = "stack", outline.type = "both", 
    # Alpha set to 0 to hide the fill colour
    alpha = 0, colour = "black",
    orientation = "y"
  ) +
  geom_area(
    # First two stacked bars
    data = ~ subset(.x, timepoint %in% c("dMRS", "fMRS")),
    aes(y = c("dMRS" = 1.7, "fMRS" = 1.3)[as.character(timepoint)]),
    position = "stack", outline.type = "both", alpha = 0, colour = "black",
    orientation = "y"
  ) +
  scale_fill_manual(name= NULL,
                    breaks = c("6","5","4","3","2","1","0"),
                    values=  c("#000000","#294e63", "#496a80","#7c98ac", "#b3c4d2","#d9e0e6","#ffffff"))+
  scale_y_discrete(breaks=c("pMRS",
                            "dMRS",
                            "fMRS"),
                   labels=c("Pre-mRS,  (N=21)",
                            "Discharge mRS,  (N=21)",
                            "Followup mRS,  (N=21)"))+
  theme_classic()

Arguably, making a separate data.frame for the lines is more straightforward, but also a bit messier.

Footbridge answered 24/1, 2022 at 17:8 Comment(5)
This is exactly what I was looking for, I had seen that previous post using geom_area as well but ran in to similar issues that I was having with geom_line. I am relatively new to R and subsetting the data in the geom is something I havent seen before. Thank you so much!Gatefold
One thing Im noticing is that when I attempt to change the linetype to dashed lines in both of the geom_area code (linetype= "dashed", the line connecting the far right group 6 is faint compared to the other lines. Any reason why this should be treat differently by these geoms?Gatefold
Technically each of the lines is part of a rectangle-ish polygon where the outlines are drawn on the left and right side. The middle lines thus have a polygon on the left and one on the right with two lines, making dashed lines appear extra heavy. You can omit one of the two lines by setting outline.type = "upper" or outline.type = "lower", but that leaves a gap at the start or end, which you could fill in with a manual annotate().Footbridge
teunbrand, I believe this is nothing but a peculiar alluvial chart, below an option which is fully automated.Antonyantonym
Yeah that is a fair pointFootbridge
A
6

You're essentially creating an alluvial diagram. You could make use of the ggalluvial package. Below the desired look - I kept it in horizontal fashion, because it's more natural to read time points from left to right (at least in Western societies). But you can simply add coord_flip if you really want to.

Also - please see below a suggestion of what I personally find a more compelling visualisation.

Check the following sources for more info on alluvial charts

library(tidyverse)
library(ggalluvial)

# I personally prefer to create a new object when you do data modifications
mrs_long <- 
  mrs %>% mutate(across(-Score,~paste(round(prop.table(.) * 100, 2)))) %>%
  pivot_longer(cols = c("pMRS", "dMRS", "fMRS"), names_to = "timepoint") %>% 
  mutate(Score=as.character(Score),
         value=as.numeric(value),
         ## I've reversed the level order
         timepoint = factor(timepoint, levels= rev(c("fMRS", "dMRS", "pMRS"))),
         Score = factor(Score, levels = 6:0))

ggplot(mrs_long,
       aes(y = value, x = timepoint)) +
  geom_flow(aes(alluvium = Score), alpha= .9, 
            lty = 2, fill = "white", color = "black",
            curve_type = "linear", 
            width = .5) +
  geom_col(aes(fill = Score), width = .5, color = "black") +
  scale_fill_manual(NULL, breaks = 6:0,
                    values=  c("#000000","#294e63", "#496a80","#7c98ac", "#b3c4d2","#d9e0e6","#ffffff"))+
  scale_y_continuous(expand = c(0,0)) +
  cowplot::theme_minimal_hgrid()
#> Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

Arguably more compelling - I find the message gets across better by making the full use of the "alluvial look". For example this could look like this:

ggplot(mrs_long,
       aes(y = value, x = timepoint, fill = Score)) +
  geom_alluvium(aes(alluvium = Score), alpha= .9, color = "black") +
  scale_fill_manual(NULL, breaks = 6:0,
                    values=  c("#000000","#294e63", "#496a80","#7c98ac", "#b3c4d2","#d9e0e6","#ffffff"))+
  scale_y_continuous(expand = c(0,0)) +
  cowplot::theme_minimal_hgrid()

Antonyantonym answered 24/1, 2022 at 17:40 Comment(1)
This is great and another excellent way of visualizing the data! Really like the alluvial look. The horizontal directed stacked bar chart is more traditional for the reporting of this outcome in previous related clinical trials so will likely make use of those suggestions, but visually I would like to do the alluvial chart in future work.Gatefold
F
2

I don't think there is an easy way of doing this, you'd have to (semi)-manually add these lines yourself. What I'm proposing below comes from this answer, but applied to your case. In essence, it exploits the fact that geom_area() is also stackable like the bar chart is. The downside is that you'll manually have to punch in coordinates for the positions where bars start and end, and you have to do it for each pair of stacked bars.

library(tidyverse)

# mrs <- tibble(...) %>% mutate(...) # omitted for brevity, same as question

mrs %>% ggplot(aes(x= value, y= timepoint, fill= Score))+
  geom_bar(color= "black", width = 0.6, stat= "identity") +
  geom_area(
    # Last two stacked bars
    data = ~ subset(.x, timepoint %in% c("pMRS", "dMRS")),
    # These exact values depend on the 'width' of the bars
    aes(y = c("pMRS" = 2.7, "dMRS" = 2.3)[as.character(timepoint)]),
    position = "stack", outline.type = "both", 
    # Alpha set to 0 to hide the fill colour
    alpha = 0, colour = "black",
    orientation = "y"
  ) +
  geom_area(
    # First two stacked bars
    data = ~ subset(.x, timepoint %in% c("dMRS", "fMRS")),
    aes(y = c("dMRS" = 1.7, "fMRS" = 1.3)[as.character(timepoint)]),
    position = "stack", outline.type = "both", alpha = 0, colour = "black",
    orientation = "y"
  ) +
  scale_fill_manual(name= NULL,
                    breaks = c("6","5","4","3","2","1","0"),
                    values=  c("#000000","#294e63", "#496a80","#7c98ac", "#b3c4d2","#d9e0e6","#ffffff"))+
  scale_y_discrete(breaks=c("pMRS",
                            "dMRS",
                            "fMRS"),
                   labels=c("Pre-mRS,  (N=21)",
                            "Discharge mRS,  (N=21)",
                            "Followup mRS,  (N=21)"))+
  theme_classic()

Arguably, making a separate data.frame for the lines is more straightforward, but also a bit messier.

Footbridge answered 24/1, 2022 at 17:8 Comment(5)
This is exactly what I was looking for, I had seen that previous post using geom_area as well but ran in to similar issues that I was having with geom_line. I am relatively new to R and subsetting the data in the geom is something I havent seen before. Thank you so much!Gatefold
One thing Im noticing is that when I attempt to change the linetype to dashed lines in both of the geom_area code (linetype= "dashed", the line connecting the far right group 6 is faint compared to the other lines. Any reason why this should be treat differently by these geoms?Gatefold
Technically each of the lines is part of a rectangle-ish polygon where the outlines are drawn on the left and right side. The middle lines thus have a polygon on the left and one on the right with two lines, making dashed lines appear extra heavy. You can omit one of the two lines by setting outline.type = "upper" or outline.type = "lower", but that leaves a gap at the start or end, which you could fill in with a manual annotate().Footbridge
teunbrand, I believe this is nothing but a peculiar alluvial chart, below an option which is fully automated.Antonyantonym
Yeah that is a fair pointFootbridge

© 2022 - 2024 — McMap. All rights reserved.