Since I spent some time on this, I thought I share this so that others can easily adapt this approach:
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
x_list = [1, 2, 3, 4, 6, 7, 9, 9, 9, 10]
df = pd.DataFrame({"X": x_list, "Y": range(len(x_list))})
f, (ax_dist, ax_hist) = plt.subplots(2, sharex=True)
sns.distplot(df["X"], ax=ax_dist)
ax_dist.set_title("old distplot")
_, FD_bins = np.histogram(x_list, bins="fd")
bin_nr = min(len(FD_bins)-1, 50)
sns.histplot(data=df, x="X", ax=ax_hist, bins=bin_nr, stat="density", alpha=0.4, kde=True, kde_kws={"cut": 3})
ax_hist.set_title("new histplot")
plt.show()
Sample output:
The main changes are
bins=bin_nr
- determine the histogram bins using the Freedman
Diaconis Estimator and restrict the upper limit to 50
stat="density"
- show density instead of count in the histogram
alpha=0.4
- for the same transparency
kde=True
- add a kernel density plot
kde_kws={"cut": 3}
- extend the kernel density plot beyond the histogram limits
Regarding the bin estimation with bins="fd"
, I am not sure that this is indeed the method used by distplot
. Comments and corrections are more than welcome.
I removed **{"linewidth": 0}
because distplot
has, as pointed out by @mwaskom in a comment, an edgecolor
line around the histogram bars that can be set by matplotlib to the default facecolor
. So, you have to sort this out according to your style preferences.
alpha=0.4
to thehistplot
– Pagel