For geom_violin, how is the total area of all violins specified?
Asked Answered
M

1

7

In a call to geom_violin within ggplot2, you can specify that the area of each violin should be proportional to the number of observations making up that violin by specifying scale="count".

I assume this operates internally by taking some total amount of area (let's call this amount X) and dividing it proportionally among all violins to be plotted. This is what I want, except that this can result in pretty narrow violins if there is substantial enough disparity in N between groups such that some groups have relatively low N. In my case, this just makes the fill color kind of hard to see.

I think this can be largely solved, in my case at least, by simply expanding X a little bit so that the really small violins get just enough area to still be readable. In other words, I want to retain variation in area between violins according to the number of observations but increase the "pool" of total area being divided amongst violins, so that every one gets slightly bigger.

Anyone have any idea how one might accomplish this? There's gotta be a toggle for this. I've tried fussing with arguments to geom_violin such as width, size, violinwidth, and such, but no luck so far...

EDIT: Code for a boring but reproducible "sample" data set that one can experiment with.

y = runif(100, 1, 10)
x = as.factor(rep(c(1,2), times=50))
z = as.factor(c(rep(1, 10), rep(2, 90)))
df=data.frame(x, y, z)
ggplot(df, aes(x=x, y=y, fill=z)) + geom_violin(scale="count")
Manvell answered 3/8, 2016 at 15:1 Comment(6)
Please provide a small reproducible example to facilitate testing of potential solutions.Pandybat
Added something boring but hopefully exemplary enough to be useful.Manvell
@Manvell Have you found any solution yet?Galinagalindo
@MarkSeygan No, I haven't, but maybe I will try to poke around in the geom_violin code this week and see what I can figure out.Manvell
I found I can do it through width in the geom_violin's parethesis.Galinagalindo
Can you expand on this? width is marked as a computed variable in the help; others on this list, like count and violinwidth are ignored when I include them in the geom_violin call. So, I'm not sure why width is not being ignored, since it is not a function argument. Also, it's not clear to me what width is doing...for me, values above ~ 1.5 not only change the shape of my violins, but also their position and orientation on the graph. Any ideas what's going on there?Manvell
K
2

You can do this by adjusting width parameter inside geom_violin. But make sure to also use position_dodge to avoid overlapping plots.

Using your data

ggplot(df, aes(x=x, y=y, fill=z)) + geom_violin(scale="count", width=2)

will give the following plot enter image description here

allowing some gap between the plots by using position_dodge

ggplot(df, aes(x=x, y=y, fill=z)) + geom_violin(scale="count", width=2, position=position_dodge(width=0.5))

This will give you the following non-overlapping plot enter image description here

Keratose answered 5/9, 2017 at 14:6 Comment(1)
Great answer, thanks! Can you explain, if you can, why width is a computed variable that the model accepts as an argument when other such variables like count and violinwidth are ignored? I think that's why I was assuming width couldn't be the right thing to be manipulating...Manvell

© 2022 - 2024 — McMap. All rights reserved.