Is it possible to annotate a seaborn violin plot with number of observations in each group?
Asked Answered
H

2

8

I would like to annotate my violin plot with the number of observations in each group. So the question is essentially the same as this one, except:

  • python instead of R,
  • seaborn instead of ggplot, and
  • violin plots instead of boxplots

Lets take this example from Seaborn API documentation:

import seaborn as sns
sns.set_style("whitegrid")
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips)

I'd like to have n=62, n=19, n=87, and n=76 on top of the violins. Is this doable?

Hairline answered 16/10, 2017 at 14:0 Comment(0)
A
9

In this situation, I like to precompute the annotated values and incorporate them into the categorical axis. In other words, precompute e.g., "Thurs, N = xxx"

That looks like this:

import seaborn as sns
sns.set_style("whitegrid")
ax= (
    sns.load_dataset("tips")
       .assign(count=lambda df: df['day'].map(df.groupby(by=['day'])['total_bill'].count()))
       .assign(grouper=lambda df: df['day'].astype(str) + '\nN = ' + df['count'].astype(str))
       .sort_values(by='day') 
       .pipe((sns.violinplot, 'data'), x="grouper", y="total_bill")
       .set(xlabel='Day of the Week', ylabel='Total Bill (USD)')   
)

enter image description here

Arabinose answered 16/10, 2017 at 16:7 Comment(8)
I like the chaining approach, must admit haven't seen it in any of the tutorials. Sure it's a bit more difficult to read (and likely a hell to debug) but reminiscent of ggplot and d3.jsHairline
What if I am using the hue too to split the categories?Hawfinch
@Hawfinch did you try that yet?Arabinose
@PaulH try what? I don't see how this approach can work if there are two categorical columns used for splitting data...Hawfinch
you'd keep the hue column separate pass it by itselfArabinose
(and annotate the hues with a legend -- basic stuff)Arabinose
But what if there are different numbers of observations for different hues in the same group?Hawfinch
e.g., Thurs, N = (40, 22).Arabinose
N
3

You first need to store all values of y positions and x positions (using your dataset for that) in order to use ax.text, then a simple for loop can write everything in the positions desired:

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips)

yposlist = tips.groupby(['day'])['total_bill'].median().tolist()
xposlist = range(len(yposlist))
stringlist = ['n = 62','n = 19','n = 87','n = 76']

for i in range(len(stringlist)):
    ax.text(xposlist[i], yposlist[i], stringlist[i])

plt.show()

Nd answered 16/10, 2017 at 14:28 Comment(2)
so the idea is to pre-calculate the x,y coordinates, and the number of observations in advance. Then just annotate them using ax.text? What if one would prefer to annotate above the plots? there's no guarantee there will be enough space within the violin to accommodate the text, especially if the number is large.Hairline
Other than labeling and creating a legend to the plots I think .text or .annotate are the only ways to do this, of course here I'm using a sample dataset, but with other dataset in hands I don't think it would be hard to get "the x,y coordinates, and the number of observations". If you wish to write the texts above the plots you would need to get the the violins' max value and use it in yposlist instead. Like this yposlist = tips.groupby(['day'])['total_bill'].max().tolist() and then fine-adjust the y position to best fit the figure since this returns the dataset's max values.Jakie

© 2022 - 2025 — McMap. All rights reserved.