Boxplot : custom width in seaborn
Asked Answered
E

1

8

I am trying to plot boxplots in seaborn whose widths depend upon the log of the value of x-axis. I am creating the list of widths and passing it to the widths=widths parameter of seaborn.boxplot.

However, I am getting that

raise ValueError(datashape_message.format("widths"))
ValueError: List of boxplot statistics and `widths` values must have same the length

When I debugged and checked there is just one dict in boxplot statistics, whereas I have 8 boxplots. Cannot Exactly figure out where the problem lies.

Here is the image of the Boxplot

I am using pandas data frame and seaborn for plotting.

Equimolecular answered 8/9, 2020 at 11:0 Comment(0)
T
4

Seaborn's boxplot doesn't seem to understand the widths= parameter.

Here is a way to create a boxplot per x value via matplotlib's boxplot which does accept the width= parameter. The code below supposes the data is organized in a panda's dataframe.

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

df = pd.DataFrame({'x': np.random.choice([1, 3, 5, 8, 10, 30, 50, 100], 500),
                   'y': np.random.normal(750, 20, 500)})
xvals = np.unique(df.x)
positions = range(len(xvals))
plt.boxplot([df[df.x == xi].y for xi in xvals],
            positions=positions, showfliers=False,
            boxprops={'facecolor': 'none'}, medianprops={'color': 'black'}, patch_artist=True,
            widths=[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
means = [np.mean(df[df.x == xi].y) for xi in xvals]
plt.plot(positions, means, '--k*', lw=2)
# plt.xticks(positions, xvals) # not needed anymore, as the xticks are set by the swarmplot
sns.swarmplot('x', 'y', data=df)
plt.show()

example plot

A related question asked how to set the box's widths depending on group size. The widths can be calculated as some maximum width multiplied by each group's size compared to the size of the largest group.

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

y_true = np.random.normal(size=100)
y_pred = y_true + np.random.normal(size=100)
df = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
df['y_true_bin'] = pd.cut(df['y_true'], range(-3, 4))

sns.set()
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(12, 5))
sns.boxplot(x='y_true_bin', y='y_pred', data=df, color='lightblue', ax=ax1)

bins, groups = zip(*df.groupby('y_true_bin')['y_pred'])
lengths = np.array([len(group) for group in groups])
max_width = 0.8
ax2.boxplot(groups, widths=max_width * lengths / lengths.max(),
            patch_artist=True, boxprops={'facecolor': 'lightblue'})
ax2.set_xticklabels(bins)
ax2.set_xlabel('y_true_bin')
ax2.set_ylabel('y_pred')
plt.tight_layout()
plt.show()

boxplot with widths depending on subset size

That answered 8/9, 2020 at 17:17 Comment(6)
This solves my problem somewhat, but it creates another one. Now when I plot seaborn.swarmplot or strip plot, it somehow shifts the entire figure by 1 boxplot. linkEquimolecular
You can remove plt.xticks(positions, xvals) if the ticks are set via swarmplot. Maybe you didn't change the old plt.xticks(range(1, len(xvals)+1), xvals), as that would shift the values. The means need to be plotted using the same positions as the boxplot.That
The boxplot is at its position, but meanline and x-ticks are still shifted though New PlotEquimolecular
Yes, Now it plots exactly as expected. :)Equimolecular
This is IMHO not exact answer to the question which was about Seaborn. But you draw the boxplot via pyplot which is kind of a hack. You don't use seaborn features here. The widths doesn't work when I want to use Seaborns x, y, hue arguments.Bary
@Bary Indeed, a more "exact" answer would be: the widthsparameter is not supported in seaborn. If you also want to include hue, it would be even much less evident how to support hue dodging and still get a nice plot, both avoiding overlapping boxes and boxes too far apart.That

© 2022 - 2024 — McMap. All rights reserved.