Split violin plot with ggplot2
Asked Answered
M

5

65

I'd like to create a split violin density plot using ggplot, like the fourth example on this page of the seaborn documentation.

enter image description here

Here is some data:

set.seed(20160229)

my_data = data.frame(
    y=c(rnorm(1000), rnorm(1000, 0.5), rnorm(1000, 1), rnorm(1000, 1.5)),
    x=c(rep('a', 2000), rep('b', 2000)),
    m=c(rep('i', 1000), rep('j', 2000), rep('i', 1000))
)

I can plot dodged violins like this:

library('ggplot2')

ggplot(my_data, aes(x, y, fill=m)) +
  geom_violin()

enter image description here

But it's hard to visually compare the widths at different points in the side-by-side distributions. I haven't been able to find any examples of split violins in ggplot - is it possible?

I found a base R graphics solution but the function is quite long and I want to highlight distribution modes, which are easy to add as additional layers in ggplot but will be harder to do if I need to figure out how to edit that function.

Mellisamellisent answered 1/3, 2016 at 7:49 Comment(0)
R
87

Or, to avoid fiddling with the densities, you could extend ggplot2's GeomViolin like this:

GeomSplitViolin <- ggproto("GeomSplitViolin", GeomViolin, 
                           draw_group = function(self, data, ..., draw_quantiles = NULL) {
  data <- transform(data, xminv = x - violinwidth * (x - xmin), xmaxv = x + violinwidth * (xmax - x))
  grp <- data[1, "group"]
  newdata <- plyr::arrange(transform(data, x = if (grp %% 2 == 1) xminv else xmaxv), if (grp %% 2 == 1) y else -y)
  newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ], newdata[1, ])
  newdata[c(1, nrow(newdata) - 1, nrow(newdata)), "x"] <- round(newdata[1, "x"])

  if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
    stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <=
      1))
    quantiles <- ggplot2:::create_quantile_segment_frame(data, draw_quantiles)
    aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), c("x", "y")), drop = FALSE]
    aesthetics$alpha <- rep(1, nrow(quantiles))
    both <- cbind(quantiles, aesthetics)
    quantile_grob <- GeomPath$draw_panel(both, ...)
    ggplot2:::ggname("geom_split_violin", grid::grobTree(GeomPolygon$draw_panel(newdata, ...), quantile_grob))
  }
  else {
    ggplot2:::ggname("geom_split_violin", GeomPolygon$draw_panel(newdata, ...))
  }
})

geom_split_violin <- function(mapping = NULL, data = NULL, stat = "ydensity", position = "identity", ..., 
                              draw_quantiles = NULL, trim = TRUE, scale = "area", na.rm = FALSE, 
                              show.legend = NA, inherit.aes = TRUE) {
  layer(data = data, mapping = mapping, stat = stat, geom = GeomSplitViolin, 
        position = position, show.legend = show.legend, inherit.aes = inherit.aes, 
        params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles, na.rm = na.rm, ...))
}

And use the new geom_split_violin like this:

ggplot(my_data, aes(x, y, fill = m)) + geom_split_violin()

enter image description here

Racine answered 10/8, 2017 at 13:5 Comment(10)
What if I want different colors for groups "a" and "b"? THanks!Ow
@Ow Not sure in which case this was desirable, but as it's implemented with modulus it might already work? Did you try to use 4 levels in the factor m ? If you only have two levels you could use: ggplot(my_data, aes(x, y, fill=interaction(x,m))) + geom_split_violin() to get different colors, I think.Racine
Yes, indeed, this works! Thanks. Useful when the distributions for a and b are of different things, and distributions are standardized, perhaps.Ow
Also see here for some mostly working code about plotting quantiles on split violins based on this function.Parette
I think this is a fantastic function. However, I prefer using @Parette 's solution, because it returns a continuous x-axis. I am sure there is a way to use the underlying (continuous) density distribution in your geom too, but it's not as straight forward to me.Ophthalmoscopy
@Parette Hi. How to adjust the gap between two half-violins using this function?Mccaslin
@LinCaijin as a workaround you might be able to use ´position_nudge´ and call it twice (separately for both groups)Racine
@Racine and what if one would like to plot normalized distributions with this geom of yours? I tried to switch ydensity to ..ndensity.. with no avail.Subsellium
@Subsellium the densities should already be normalized (in that they integrate to 1) - if you want something else try calling geom_split_violin with scale="width" or scale="count".Racine
This is awesome! Thank you @jan-glx! If anyone wants to put a little space between the two halves of the violin, scroll down this page and see a tiny modification with a "nudge" param (borrowing from gghaves). @LinCaijin, looks like that's what you are looking for?Prenatal
P
57

Note: I think the answer by jan-glx is much better, and most people should use that instead. But sometimes, the manual approach is still helpful to do weird things.


You can achieve this by calculating the densities yourself beforehand, and then plotting polygons. See below for a rough idea.

Get densities

library(dplyr)
pdat <- my_data %>%
  group_by(x, m) %>%
  do(data.frame(loc = density(.$y)$x,
                dens = density(.$y)$y))

Flip and offset densities for the groups

pdat$dens <- ifelse(pdat$m == 'i', pdat$dens * -1, pdat$dens)
pdat$dens <- ifelse(pdat$x == 'b', pdat$dens + 1, pdat$dens)

Plot

ggplot(pdat, aes(dens, loc, fill = m, group = interaction(m, x))) + 
  geom_polygon() +
  scale_x_continuous(breaks = 0:1, labels = c('a', 'b')) +
  ylab('density') +
  theme_minimal() +
  theme(axis.title.x = element_blank())

Result

enter image description here

Parette answered 1/3, 2016 at 9:28 Comment(4)
How would you calculate densities if there are thee groups (e.g. i, j and x)Surakarta
What should the three-group plot look like? It might be hard to visualize if you want to show density curves for all three groups within each violin.Mellisamellisent
that's a great option for cases where the original data is huge. Pre-calculating densities make the plot a lot more lightweight!Archiplasm
Tremendous!! I managed to get your method working with plotnine. I wanted to use plotnine rather than seaborn to give a consistent feel with other charts and the first solution looked too difficult to implement. Your was easy. Fantastic solution!Ticktock
C
4

It is now possible to do this with the introdataviz package using the geom_split_violin function, which makes it really easy to create these plots. Here is a reproducible example:

set.seed(20160229)
my_data = data.frame(
  y=c(rnorm(1000), rnorm(1000, 0.5), rnorm(1000, 1), rnorm(1000, 1.5)),
  x=c(rep('a', 2000), rep('b', 2000)),
  m=c(rep('i', 1000), rep('j', 2000), rep('i', 1000))
)

library(ggplot2)
# devtools::install_github("psyteachr/introdataviz")
library(introdataviz)

ggplot(my_data, aes(x = x, y = y, fill = m)) +
  geom_split_violin()

Created on 2022-08-24 with reprex v2.0.2

As you can see, it creates a split violin plot. If you want more information and a tutorial of this package, check the link above.

Clements answered 24/8, 2022 at 11:0 Comment(1)
@Axeman, You are right. Fix it thanks!Clements
P
2

@jan-jlx's solution is wonderful. For densities with thin tails, I'd like to insert a little space between the two halves of the violin so the tails are easier to tell apart. Here's a slight modification of @jan-jlx's code to do this, borrowing the nudge parameter from the gghalves package:

GeomSplitViolin <- ggplot2::ggproto(
    "GeomSplitViolin",
    ggplot2::GeomViolin,
    draw_group = function(self,
                          data,
                          ...,
                          # add the nudge here
                          nudge = 0,
                          draw_quantiles = NULL) {
        data <- transform(data,
                          xminv = x - violinwidth * (x - xmin),
                          xmaxv = x + violinwidth * (xmax - x))
        grp <- data[1, "group"]
        newdata <- plyr::arrange(transform(data,
                                           x = if (grp %% 2 == 1) xminv else xmaxv),
                                 if (grp %% 2 == 1) y else -y)
        newdata <- rbind(newdata[1, ],
                         newdata,
                         newdata[nrow(newdata), ],
                         newdata[1, ])
        newdata[c(1, nrow(newdata)-1, nrow(newdata)), "x"] <- round(newdata[1, "x"])

        # now nudge them apart
        newdata$x <- ifelse(newdata$group %% 2 == 1,
                            newdata$x - nudge,
                            newdata$x + nudge)

        if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {

            stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1))

            quantiles <- ggplot2:::create_quantile_segment_frame(data,
                                                             draw_quantiles)
            aesthetics <- data[rep(1, nrow(quantiles)),
                               setdiff(names(data), c("x", "y")),
                               drop = FALSE]
            aesthetics$alpha <- rep(1, nrow(quantiles))
            both <- cbind(quantiles, aesthetics)
            quantile_grob <- ggplot2::GeomPath$draw_panel(both, ...)
            ggplot2:::ggname("geom_split_violin",
                             grid::grobTree(ggplot2::GeomPolygon$draw_panel(newdata, ...),
                                            quantile_grob))
        }
    else {
            ggplot2:::ggname("geom_split_violin",
                             ggplot2::GeomPolygon$draw_panel(newdata, ...))
        }
    }
)

geom_split_violin <- function(mapping = NULL,
                              data = NULL,
                              stat = "ydensity",
                              position = "identity",
                              # nudge param here
                              nudge = 0,
                              ...,
                              draw_quantiles = NULL,
                              trim = TRUE,
                              scale = "area",
                              na.rm = FALSE,
                              show.legend = NA,
                              inherit.aes = TRUE) {

    ggplot2::layer(data = data,
                   mapping = mapping,
                   stat = stat,
                   geom = GeomSplitViolin,
                   position = position,
                   show.legend = show.legend,
                   inherit.aes = inherit.aes,
                   params = list(trim = trim,
                                 scale = scale,
                                 # don't forget the nudge
                                 nudge = nudge,
                                 draw_quantiles = draw_quantiles,
                                 na.rm = na.rm,
                                 ...))
}

Here's a plot I get with geom_split_violin(nudge = 0.02).

enter image description here

Prenatal answered 23/12, 2022 at 3:29 Comment(1)
This is an excellent answer, as it allows for placing of geom_jitter() in the space between the two violin halves by matching nudge in geom_split_violin() with width in geom_jitter(), effectively creating a mirrored, vertical raincloud plot. I am however having difficulty with the KDE for long-tailed distributions, as outlined in one of my queries (#76088876). Perhaps you have an idea how to add this to your excellent code @Trang Q. Nguyen and @jan-glx?Horatio
C
1

I would also like to share my humble contribution to this post. I have been working for quite some time already on a similar plot that I called a craviola plot (in reference to the asymmetrical "Giannini Craviola" guitar).
So a craviola plot is in essence a splitted violin plot.
By default, it looks like this: enter image description here Distributions are shown by the splitted violins. They are accompanied by boxplots, 1 box per distribution. And by a red dot symbolizing the mean of each distribution (to be clear, red dots do not always superimpose with the median line of boxplots, it just happen to be the case in my example).

This function is part of my visualization package BiocompR which you can conveniently install from the following Github repository: https://github.com/YoannPa/BiocompR
To install the package in R do: devtools::install_github("YoannPa/BiocompR")

In order to generate such plot you must use the ggcraviola() function. The documentation I wrote for the ggcraviola() function already provide some examples that you can run yourself in RStudio to see its full potential:

library(BiocompR)
?ggcraviola

To reproduce the plot above you can run the following code:

library(BiocompR)
    
df.complete = data.frame(
        Groups = rep(c('A', 'B', 'C'), each = 2000),
        Conditions = rep(c('I', 'J'), each = 1000,3),
        Values = c(rnorm(1000, 0), rnorm(1000, 0.5),
                   rnorm(1000, 3), rnorm(1000, 3.5),
                   rnorm(1000, -3), rnorm(1000, -3.5)))

ggcraviola(data = df.complete, lines.col = "black")

Of course you have the freedom to show/hide the boxplots and the mean (red dots) for all distributions. ggcraviola() works with ggplot2, so you can also add more component of customization after using the ggcraviola() function like this:

library(BiocompR)

df.complete = data.frame(
  Groups = rep(c('A', 'B', 'C'), each = 2000),
  Conditions = rep(c('I', 'J'), each = 1000,3),
  Values = c(rnorm(1000, 0), rnorm(1000, 0.5),
             rnorm(1000, 3), rnorm(1000, 3.5),
             rnorm(1000, -3), rnorm(1000, -3.5)))

ggcraviola(data = df.complete, lines.col = "black") +
  ggtitle("This is a Craviola plot!") + # Add title
  theme(plot.title = element_text(hjust = 0.5),
        axis.text = element_text(size = 14, color = "black"),# Custom axis text
        axis.title = element_text(size = 15),
        legend.title = element_text(size = 13), # Change legend font size
        legend.text = element_text(size = 12),
        panel.background = element_blank(), # Change panel appearance
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.y = element_line(color = "grey"),
        panel.grid.minor.y = element_line(color = "grey")) +
  scale_y_continuous(expand = c(0, 0)) + #Expand fully plot panel on Y-axis
  scale_fill_manual(
    labels = c("Control", "Case"), # Rename conditions
    values = biopalette(name = "BiocompR_cond3", mute = TRUE))

Which gives: enter image description here Hopefully someone will find the ggcraviola() function useful!
Knowing how related this topic is to the ggcraviola() function, I decided to reference it in the documentation of my package.

Cristycriswell answered 1/3, 2024 at 18:29 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.