Half violin plot with ignoring some factors in R
Asked Answered
B

1

0

I have a very similar question to the one asked in Half violin plot with different factors in R which was perfectly answered by @AllanCameron. In addition to the data of the mentioned question I also have different grades:

student_id  group       grade    test_id  Score
145         Treatment   2        pre      0.12
145         Treatment   3        post     0.78
109         Treatment   5        pre      0.45
109         Treatment   5        post     0.99
195         Treatment   4        pre      0.22
195         Treatment   4        post     0.75
119         Treatment   6        pre      0.15
119         Treatment   6        post     0.59

I would like to do a half-violin plot where one factor is the pre- and posttest and the two halfs constitute of the 3rd/4th grade for the posttest and the 5th/6th grade for the pretest:

violin

I've played around with the code provided in the previous answer, but the only thing I came up with is plotting the two grades separately as factors and then cutting and pasting them together. Not very elegant! I hope someone has a better way of achieving this.

Here is a MWE:

set.seed(1)
data <- data.frame(
                 group = rep(sample(c('Treatment', 'Control'), 50, TRUE), 
                             each = 2),
                 test_id = rep(c('pre', 'post'), 50),
                 grade = sample(3:6, 100, replace = TRUE),
                 Score = runif(100)
                 )

library(ggplot2)
library(see)

ggplot(data, aes(test_id, Score, fill = grade)) +
  geom_boxplot(width = 0.1, position = position_dodge(0.2)) +
  geom_violinhalf(aes(group = interaction(test_id, grade)), fill = 'gray',
                  trim = FALSE, flip = c(1, 2)) +
  theme_classic(16)

This produces the undesired plot

violin_wrong

Belga answered 15/9 at 18:15 Comment(3)
Have you tried something?Intense
I think you need to edit the second paragraph so it consists of several sentences. It currently expects us to understand what the data is and what is desired for different variables.Hoffman
Thanks for your remarks! See my edit. Is it clearer now?Belga
H
2

Not 100% sure whether you want separate boxplots for the groups or just one boxplot but to fix the issue with the violin plots you can filter the data used for geom_violinhalf, i.e. to include only grades 3 and 4 for the pre-test data and 5 and 6 for the post-test data. Additionally, as you now have four groups you have to set flip=c(1, 3) to flip the left-hand violins.

Note: For the reprex I mapped "grade" on fill in geom_violinhalf to check and show that it displays the right grades.

library(ggplot2)
library(see)

ggplot(data, aes(test_id, Score, fill = group)) +
  geom_boxplot(width = 0.1, position = position_dodge(0.2)) +
  geom_violinhalf(
    data = ~ subset(
      .x,
      (test_id %in% "pre" & grade %in% c(3, 4)) |
        (test_id %in% "post" & grade %in% c(5, 6))
    ),
    aes(group = interaction(test_id, grade), fill = factor(grade)),
    #fill = "gray",
    trim = FALSE, 
    flip = c(1, 3)
  ) +
  scale_x_discrete(limits = c("pre", "post")) +
  theme_classic()

Houlihan answered 16/9 at 6:0 Comment(1)
Thanks! The filtering was exactly what I needed.Belga

© 2022 - 2024 — McMap. All rights reserved.