How to plot two violin plot series on the same graph using seaborn?
Asked Answered
C

3

9

Looking at the documentation about violon plots with seaborn, I would like to know how to plot two series of violin plots on the same axis (point 1) and that they are comparable (point 2).

About point 1, I would like to reproduce that plot for each sex :

fig, ax = plt.subplots()
sns.violinplot(x="day", y="total_bill", hue="smoker",
                    data=tips, split=True, ax=ax)

I could do it on two subplots:

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(211)
sns.violinplot(x="day", y="total_bill", hue="smoker",
               data=tips[tips.sex == "Female"], split=True, ax=ax)

ax = fig.add_subplot(212)
sns.violinplot(x="day", y="total_bill", hue="smoker",
               data=tips[tips.sex == "Male"], split=True, ax=ax)

I would like to plot the two violin plot series on the same matplotlib axes.

Another point is about the width of the violin plot. I do not clearly understand if the violins are normalized or not and how ? I assume that the width is computed for each plot. On the example above the width is computed for Female for the first subplot and for Male for the second subplot. Thus can I compare directly the densities ? I suppose that I can compare the shapes but, for example, I cannot compare the amount of Male smoker on Monday and Female ones ? Is there a way to manage the normalization of the violins ?

Cockatiel answered 21/11, 2017 at 9:3 Comment(2)
For point 1, there is no built-in way of doing this with searborn, but you might be able to place the violins manually after each other. This will probably be more work than it is worth and the result might look cluttered. Are you envisioning this organization on the x-axis: Thu{M, F}, Fri{M, F}, etc ?Calculable
Actually, since violinplot no longer takes the positions argument, this becomes quite convoluted. You could add new categorical day values that are unique for each sex (e.g. females get Sun and males of Sun_). Then you can order these along the x-axis and change the xticklabels manually. As I said, not very convenient.Calculable
C
14

For you first point, there is no way of doing this in Seaborn. Check out my comments for possible workarounds, but in brief, I don't think the time spent would be worthwhile.

For you second question, the scale and scale_hue parameters to violinplot controls how the violin patch is normalized/scaled:

scale : {“area”, “count”, “width”}, optional

The method used to scale the width of each violin. If area, each violin will have the same area. If count, the width of the violins will be scaled by the number of observations in that bin. If width, each violin will have the same width.

scale_hue : bool, optional

When nesting violins using a hue variable, this parameter determines whether the scaling is computed within each level of the major grouping variable (scale_hue=True) or across all the violins on the plot (scale_hue=False).

The defaults are 'area' and False. You can see below how varying these parameters affects the violins. For example, if you want to compare between plots and represent the absolute counts truthfully, you could set scale='count' and scale_hue=False. Note that the violins would still be scaled to the maximum count within the plot (and not in the dataset), so in our case, the biggest violin for females would represent ~40 observations while the biggest violin for males would represent ~25 observations.

fig, axes = plt.subplots(4, 2, figsize=(10, 16), sharey='row')
axes_cols = (axes.flatten()[::2], axes.flatten()[1::2])

for (sex_name, sex), axes_col in zip(tips.groupby('sex'), axes_cols):
    sns.countplot(x="day", hue="smoker", data=sex, ax=axes_col[0])
    for scale, ax in zip(['area', 'count', 'width'], axes_col[1:]):
        sns.violinplot(x="day", y="total_bill", hue="smoker",
            data=sex, split=True, ax=ax, scale=scale)
        ax.set_title('scale = {}'.format(scale), y=0.95)
sns.despine()
fig.tight_layout()

enter image description here

Adding scale_hue=False: enter image description here

Calculable answered 25/11, 2017 at 15:13 Comment(1)
Really nice ! the count plot on the top of violin plots is really helpful.Cockatiel
T
4

To complement the answer by @joelostblom, I want to share a DIY solution for plotting two series of violin plots on the same axis.

import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D

# Load the dataset
tips = sns.load_dataset("tips")

# Configure the coloring
color = {"Male": {"Yes": "orange", "No": "blue"}, "Female": {"Yes": "red", "No": "green"}}

# create figure and axes
fig, ax = plt.subplots()

# draw violins for each sex
sex_types = set(tips.sex)
for sex in sex_types:
    sns.violinplot(
        x="day", 
        y="total_bill", 
        hue="smoker",
        data=tips[tips.sex == sex],
        palette=color[sex],
        split=True,
        ax=ax,
        scale="count",
        scale_hue=False,
        saturation=0.75,
        inner=None
    )

# Set transparancy for all violins
for violin in ax.collections:
    violin.set_alpha(0.25)

# Compose a custom legend
custom_lines = [
    Line2D([0], [0], color=color[sex][smoker], lw=4, alpha=0.25) 
    for smoker in ["Yes", "No"] 
    for sex in sex_types
]
ax.legend(
    custom_lines, 
    [f"{sex} : {smoker}" for smoker in ["Yes", "No"] for sex in sex_types], 
    title="Gender : Smoker"
)

enter image description here

Thermochemistry answered 14/1, 2022 at 16:47 Comment(0)
C
2

As said Joel Ostblom in ist answer, there is no way to plot several violin plots natively in seaborn. I suggest here a solution following his hints. I also add a count plots on the top, as he did in his answer.

I do not really know the tips data set, thus the plot below is certainly meaningless.

import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="whitegrid", font_scale=1.5)

tips = sns.load_dataset("tips")
# here you add a new column with the two categorical data you want
tips["sex_time"] = tips[["sex", "time"]].apply(lambda x: "_".join(x), axis=1)

fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(8, 10), 
                         sharex=True, gridspec_kw=dict(height_ratios=(1, 3), hspace=0))

# select the order you want:
order=["Female_Lunch", "Male_Lunch", "Female_Dinner", "Male_Dinner"]

sns.countplot(
    data=tips, x="sex_time", hue="smoker", 
    order=order,
    ax=axes[0]
)

sns.violinplot(
    x="sex_time", y="total_bill", hue="smoker", data=tips, 
    split=True, scale="count", scale_hue=False, inner="stick",
    order=order,
    ax=axes[1]
)
axes[1].set_xticklabels(["Lunch (Female)", "Lunch (Male)", "Dinner (Female)", "Dinner (Male)"])
axes[1].set_xlabel("Time (Sex)")
axes[1].legend("")

enter image description here

Cockatiel answered 28/11, 2017 at 9:35 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.