How to show whiskers and points on violin plots?
Asked Answered
I

1

5

I have a dataframe df with the following data. I want to plot the logCPM expression of the gene between two groups A and B.

Samples  Type   GeneA
Sample1    B    14.82995162
Sample2    B    12.90512275
Sample3    B    9.196524783
Sample4    A    19.42866012
Sample5    A    19.70386922
Sample6    A    16.22906914
Sample7    A    12.48966785
Sample8    B    15.53280377
Sample9    A    9.345795955
Sample10    B   9.196524783
Sample11    B   9.196524783
Sample12    B   9.196524783
Sample13    A   9.434355615
Sample14    A   15.27604692
Sample15    A   18.90867329
Sample16    B   11.71503095
Sample17    B   13.7632545
Sample18    A   9.793864295
Sample19    B   9.196524783
Sample20    A   14.52562066
Sample21    A   13.85116605
Sample22    A   9.958492229
Sample23    A   17.57075876
Sample24    B   13.04499079
Sample25    B   15.33577937
Sample26    A   13.95849295
Sample27    B   9.196524783
Sample28    A   18.20524388
Sample29    B   17.7058873
Sample30    B   14.0199393
Sample31    A   16.21499069
Sample32    A   14.171432
Sample33    B   9.196524783
Sample34    B   9.196524783
Sample35    B   15.16648035
Sample36    B   12.9435081
Sample37    B   13.81971106
Sample38    B   15.82901231

I tried making a violin plot using ggviolin.

library("ggpubr")
pdf("eg.pdf", width = 5, height = 5)
p <- ggviolin(df, x = "Type", y = "GeneA", fill = "Type",
          color = "Type", palette = c("#00AFBB", "#FC4E07"),
          add="boxplot",add.params = list(fill="white"),
          order = c("A", "B"),
          ylab = "GeneA (logCPM)", xlab = "Groups")
ggpar(p, ylim = c(5,25))
dev.off()

I got the violin plot like this enter image description here.

1) In this I don't see any whiskers and any points on violin.

2) Is there a way to show which point is which sample? like giving a different color to the point (for eg: I'm interested in Sample 10. I want to give different color to that point because I'm interested to see the expression of that)

Thank you

Idalia answered 3/10, 2018 at 15:13 Comment(7)
To your second question, you'll need to add points individually for that, box-plots and violin plots are not intended to highlight individual points; the closest I've seen is that some box plots (base R, notably) optionally shows points for outliers, but they don't do anything other than show the dot. I think for all other box/violin functions, you are going to need to explicitly draw points yourself (e.g., geom_point).Apprehension
Can you give any example of a violin plot showing points? I've never seen it as a default, only as a manual after-market addition.Apprehension
Not sure how to do this in ggpubr, but ggbetweenstats function from ggstatsplot has this as a default behavior: cran.r-project.org/web/packages/ggstatsplot/vignettes/…Editorial
It seems like the whiskers may be "missing" from the boxplot because they are the same color as the fill of the violin plot. Can you see them if you set the color to a constant like you did fill in add.params()?Brunn
@Apprehension Could you please help me with some code using the above mentioned data. thank youIdalia
Ok. I removed the color="Type"; Now I see the whiskers. But Why I don't see the lower whisker for group "B"? [imgur.com/Qz1NU1e]Idalia
A missing whisker is a feature of the data, meaning it is likely a right-tail distribution. (It is not a "failure" of the plotting mechanism.)Apprehension
C
8

May I suggest using elephant/raincloud or hybrid boxplot plots instead?

From the blog post linked above:

Violin plots mirror the data density in a totally uninteresting/uninformative way, simply repeating the same exact information for the sake of visual aesthetic.

In raincloud plot, we get basically everything we need: eyeballed statistical inference, assessment of data distributions (useful to check assumptions), and the raw data itself showing outliers and underlying patterns.

library(tidyverse)
library(ggrepel)

df <- read_table2(txt)

# create new variable for coloring & labeling `Sample10` pts
df <- df %>% 
  mutate(colSel = ifelse(Samples == 'Sample10', '#10', 'dummy'),
         labSel = ifelse(Samples == 'Sample10', '#10', ''))

# create summary statistics
sumld <- df %>%
  group_by(Type) %>%
  summarise(
    mean     = mean(GeneA, na.rm = TRUE),
    median   = median(GeneA, na.rm = TRUE),
    sd       = sd(GeneA, na.rm = TRUE),
    N        = n(),
    ci       = 1.96 * sd/sqrt(N),
    lower95  = mean - ci,
    upper95  = mean + ci,
    lower    = mean - sd,
    upper    = mean + sd) %>% 
  ungroup()
sumld
#> # A tibble: 2 x 10
#>   Type   mean median    sd     N    ci lower95 upper95 lower upper
#>   <chr> <dbl>  <dbl> <dbl> <int> <dbl>   <dbl>   <dbl> <dbl> <dbl>
#> 1 A      14.7   14.5  3.54    17  1.68    13.0    16.3 11.1   18.2
#> 2 B      12.4   12.9  2.85    21  1.22    11.2    13.6  9.54  15.2

raincloud plot

## get geom_flat_violin function
## https://gist.github.com/benmarwick/b7dc863d53e0eabc272f4aad909773d2
## mirror: https://pastebin.com/J9AzSxtF 
devtools::source_gist("2a1bb0133ff568cbe28d", filename = "geom_flat_violin.R")

pos <- position_jitter(width = 0.15, seed = 1)

p0 <- ggplot(data = df, aes(x = Type, y = GeneA, fill = Type)) +
  geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .8) +
  guides(fill = FALSE) +
  guides(color = FALSE) +
  scale_color_brewer(palette = "Dark2") +
  scale_fill_brewer(palette = "Dark2") +
  theme_classic()

# raincloud plot
p1 <- p0 + 
  geom_point(aes(color = Type), 
             position = pos, size = 3, alpha = 0.8) +
  geom_boxplot(width = .1, show.legend = FALSE, outlier.shape = NA, alpha = 0.5)
p1

# coloring Sample10
p0 +
  geom_point(aes(color = colSel), 
             position = pos, size = 3, alpha = 0.8) +
  geom_text_repel(aes(label = labSel),
                  point.padding = 0.25,
                  direction = 'y',
                  position = pos) +
  geom_boxplot(width = .1, show.legend = FALSE, outlier.shape = NA, alpha = 0.5) +
  scale_color_manual(values = c('dummy' = 'grey50', '#10' = 'red')) 

# errorbar instead of boxplot
p0 + 
  geom_point(aes(color = colSel), 
             position = pos, size = 3, alpha = 0.8) +
  geom_point(data = sumld, aes(x = Type, y = mean), 
             position = position_nudge(x = 0.3), size = 3.5) +
  geom_text_repel(aes(label = labSel),
                  point.padding = 0.25,
                  direction = 'y',
                  position = pos) +
  geom_errorbar(data = sumld, aes(ymin = lower95, ymax = upper95, y = mean), 
                position = position_nudge(x = 0.3), width = 0) +
  guides(fill = FALSE) +
  guides(color = FALSE) +
  scale_color_manual(values = c('dummy' = 'grey50', '#10' = 'red')) +
  scale_fill_brewer(palette = "Dark2") +
  theme_classic()

hybrid boxplot using geom_boxjitter() from the ggpol package

## https://stackoverflow.com/a/49338481/ 
library(ggpol)

half_box <- ggplot(df) + geom_boxjitter(aes(x = Type, y = GeneA, 
                                            fill = Type, color = Type),
                                        jitter.shape = 21, jitter.color = NA, 
                                        jitter.height = 0, jitter.width = 0.04,
                                        outlier.color = NA, errorbar.draw = TRUE) +
  scale_color_brewer(palette = "Dark2") +
  scale_fill_brewer(palette = "Dark2") +
  theme_classic()
half_box

Bonus: you can also replace geom_point() with geom_quasirandom() from the ggbeeswarm package. Here is one example.

.
.
.
Created on 2018-10-03 by the reprex package (v0.2.1.9000)

Cattleman answered 4/10, 2018 at 16:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.