Emulating deprecated seaborn distplots
Asked Answered
L

2

4

Seaborn distplot is now deprecated and will be removed in a future version. It is suggested to use histplot (or displot as a figure-level plot) as an alternative. But the presets differ between distplot and histplot:

from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns

x_list = [1, 2, 3, 4, 6, 7, 9, 9, 9, 10]
df = pd.DataFrame({"X": x_list, "Y": range(len(x_list))})

f, (ax_dist, ax_hist) = plt.subplots(2, sharex=True)

sns.distplot(df["X"], ax=ax_dist)
ax_dist.set_title("old distplot")
sns.histplot(data=df, x="X", ax=ax_hist)
ax_hist.set_title("new histplot")

plt.show()

enter image description here

So, how do we have to configure histplot to replicate the output of the deprecated distplot?

Label answered 21/5, 2021 at 14:13 Comment(0)
L
6

Since I spent some time on this, I thought I share this so that others can easily adapt this approach:

from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np

x_list = [1, 2, 3, 4, 6, 7, 9, 9, 9, 10]
df = pd.DataFrame({"X": x_list, "Y": range(len(x_list))})

f, (ax_dist, ax_hist) = plt.subplots(2, sharex=True)

sns.distplot(df["X"], ax=ax_dist)
ax_dist.set_title("old distplot")
_, FD_bins = np.histogram(x_list, bins="fd")
bin_nr = min(len(FD_bins)-1, 50)
sns.histplot(data=df, x="X", ax=ax_hist, bins=bin_nr, stat="density", alpha=0.4, kde=True, kde_kws={"cut": 3})
ax_hist.set_title("new histplot")

plt.show()

Sample output:
enter image description here

The main changes are

  • bins=bin_nr - determine the histogram bins using the Freedman Diaconis Estimator and restrict the upper limit to 50
  • stat="density" - show density instead of count in the histogram
  • alpha=0.4 - for the same transparency
  • kde=True - add a kernel density plot
  • kde_kws={"cut": 3} - extend the kernel density plot beyond the histogram limits

Regarding the bin estimation with bins="fd", I am not sure that this is indeed the method used by distplot. Comments and corrections are more than welcome.

I removed **{"linewidth": 0} because distplot has, as pointed out by @mwaskom in a comment, an edgecolor line around the histogram bars that can be set by matplotlib to the default facecolor. So, you have to sort this out according to your style preferences.

Label answered 21/5, 2021 at 14:13 Comment(6)
You could also add alpha=0.4 to the histplotPagel
Well spotted. I had the impression that the color is slightly off but didn't follow through with this thought.Label
"Regarding the bin estimation with bins="fd", I am not sure that this is indeed the method used by distplot. Comments and corrections are more than welcome." This is basically correct; histplot uses numpy's "auto" mode, which takes the max of the FD and Sturges estimators. The only thing that will be tricky to fully replicate is that distplot used min(FD_bins, 50) by default. So if you really want exactly the same behavior, you'll need to do that externally.Sile
Oh also linewidth=0 is wrong; distplot bars have visible edges, but with the matplotlib defaults the bar edgecolor is set to "face". You'll see the difference if you activate one of the seaborn themes.Sile
@Sile Thanks for your input, I was hoping for your comments. I assumed that auto was the preset for distplot because the documentation mentions something about an optimized approach. However, this was obviously wrong. Not sure though, how to implement the different edgecolor settings but I guess people can figure this out based on their specific stylesheets.Label
@Mr.T distplot uses an "optimized" approach (relative to a fixed number of bins), but there are lots of different reference rules that work better or worse for different sorts of data ... numpy's "auto" tries to balance the two of the most common ones. Actually when distplot was written, numpy didn't have any of the reference rules implemented, so distplot implements the FD rule internally. It became possible to pass bins="auto" once numpy added it and matplotlib hooked into numpy for computation, but the default remained to use the internal FD computation, with the upper limit.Sile
F
0

#Use histplot() #histplot is used on univariate

import seaborn as sns
import matplotlib.pyplot as plt

fig = sns.FacetGrid(data = data, col = 'variable name', hue = 'variable name', heigth = 9, palette = 'Set1')

fig = fig.map(sns.histplot, variable name, kde = True).add_legend()
Flow answered 5/8, 2023 at 21:38 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.