Your data does not appear to be gamma-distributed, but assuming it is, you could fit it like this:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
gamma = stats.gamma
a, loc, scale = 3, 0, 2
size = 20000
y = gamma.rvs(a, loc, scale, size=size)
x = np.linspace(0, y.max(), 100)
# fit
param = gamma.fit(y, floc=0)
pdf_fitted = gamma.pdf(x, *param)
plt.plot(x, pdf_fitted, color='r')
# plot the histogram
plt.hist(y, normed=True, bins=30)
plt.show()
The area under the pdf (over the entire domain) equals 1.
The area under the histogram equals 1 if you use normed=True
.
x
has length size
(i.e. 20000), and pdf_fitted
has the same shape as x
. If we call plot
and specify only the y-values, e.g. plt.plot(pdf_fitted)
, then values are plotted over the x-range [0, size]
.
That is much too large an x-range. Since the histogram is going to use an x-range of [min(y), max(y)]
, we much choose x
to span a similar range: x = np.linspace(0, y.max())
, and call plot
with both the x- and y-values specified, e.g. plt.plot(x, pdf_fitted)
.
As Warren Weckesser points out in the comments, for most applications you know the gamma distribution's domain begins at 0. If that is the case, use floc=0
to hold the loc
parameter to 0. Without floc=0
, gamma.fit
will try to find the best-fit value for the loc
parameter too, which given the vagaries of data will generally not be exactly zero.
loc
parameter of the gamma distribution is not used (i.e. the PDF should not be shifted), and the value is fixed at 0. By default, thefit
method treatsloc
as fitting parameter, so you might get a small nonzero shift--check the parameters returned byfit
. You can tellfit
to not includeloc
as a fitting parameter by using the argumentfloc=0
. – Coed