scipy, lognormal distribution - parameters
Asked Answered
I

5

42

I want to fit lognormal distribution to my data, using python scipy.stats.lognormal.fit. According to the manual, fit returns shape, loc, scale parameters. But, lognormal distribution normally needs only two parameters: mean and standard deviation.

How to interpret the results from scipy fit function? How to get mean and std.dev.?

Infanta answered 5/1, 2012 at 18:29 Comment(0)
V
48

The distributions in scipy are coded in a generic way wrt two parameter location and scale so that location is the parameter (loc) which shifts the distribution to the left or right, while scale is the parameter which compresses or stretches the distribution.

For the two parameter lognormal distribution, the "mean" and "std dev" correspond to log(scale) and shape (you can let loc=0).

The following illustrates how to fit a lognormal distribution to find the two parameters of interest:

In [56]: import numpy as np

In [57]: from scipy import stats

In [58]: logsample = stats.norm.rvs(loc=10, scale=3, size=1000) # logsample ~ N(mu=10, sigma=3)

In [59]: sample = np.exp(logsample) # sample ~ lognormal(10, 3)

In [60]: shape, loc, scale = stats.lognorm.fit(sample, floc=0) # hold location to 0 while fitting

In [61]: shape, loc, scale
Out[61]: (2.9212650122639419, 0, 21318.029350592606)

In [62]: np.log(scale), shape  # mu, sigma
Out[62]: (9.9673084420467362, 2.9212650122639419)
Vanhomrigh answered 5/1, 2012 at 19:45 Comment(5)
two comments: there was a bug in scipy 0.9 with floc that has been fixed in scipy 0.10 projects.scipy.org/scipy/ticket/1536 and second because of the generic parameterization the lognormal distribution doesn't have a usual parameterization, for example projects.scipy.org/scipy/ticket/1502 .Panthea
thanks for the patches. I didn't get the second one - from the comments in the link, it seems that it there is no bug there actually..?Infanta
@JakubM. Yes, if you're using the latest scipy (0.10), the answer/example given above is not contradicted by any of the tickets mentioned in the comment by user33700.Vanhomrigh
@ars: by the way, I didn't know that mu = log(scale) (which is obviously right)Infanta
my second comment wasn't about a bug, just about your question how to interpret the results of lognorm.fit, that is the parameterization of lognorm, since it's a bit tricky, as in the example of ars.Panthea
L
15

I just spent some time working this out and wanted to document it here: If you want to get the probability density (at point x) from the three return values of lognorm.fit (lets call them (shape, loc, scale)), you need to use this formula:

x = 1 / (shape*((x-loc)/scale)*sqrt(2*pi)) * exp(-1/2*(log((x-loc)/scale)/shape)**2) / scale

So as an equation that is (loc is µ, shape is σ and scale is α):

x = \frac{1}{(x-\mu)\cdot\sqrt{2\pi\sigma^2}}  \cdot e^{-\frac{log(\frac{x-\mu}{\alpha})^2}{2\sigma^2}}

Lymphatic answered 9/7, 2013 at 19:41 Comment(3)
Why do we divide by (1/alpha) at the end of your formula?Bernadinebernadotte
This is not my formula, but the way scipy works. I don’t see any division by (1/α} at the end, so I will assume you are talking about the division by scale – correct me if I misunderstood. I looked at this code a while ago, but if I remember correctly, that is what is done to every kind of distribution. But note that there is another scale in there at the beginning and they cancel each other out (as you can see in the equation).Lymphatic
The 2 scales cancel each other. So better use that: (np.exp((-1/2)*(np.log((x-loc)/scale)/shape)**2))/(shape*(x-loc)*np.sqrt(2*np.pi))Pregnable
A
2

I think this will help. I was looking for the same issue for a long time and finally found a solution for my problem. In my case, I was trying to fit some data to the lognormal distribution using scipy.stats.lognorm module. However, when I finally got the model parameters, I could not find a way to replicate my results using the mean and std from y data.

In the code below, I explain from the mean and std parameters how to produce a normally distributed data sample using scipy.stats.norm module. Using those data, I fit the normal model (norm_dist_fitted) and also create a normal model using mean and standard deviation (mu, sigma) extracted from the data.

Original model producing the data, fitted and produced-by-(mu-sigma)-pair is compared in a graph.

Fig1


In the next section of the code, I use the normal data to produce a lognormal-distributed sample. To do so notice that the lognormal samples will be the exponential of the original sample. Hence, the mean and standard deviation of the exponential sample will be (exp(mu) and exp(sigma)).

I fitted the produced data to a lognormal (since the log of my sample (exp(x)) is normally distributed and follow the lognormal model assumptions.

To produce a lognormal model from the mean and standard deviation of your original data (x) the code will be:

lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))

However, if your data is already in the exponential space (exp(x)), then you have to use:

muX = np.mean(np.log(x))
sigmaX = np.std(np.log(x))
scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX)

Fig2

import scipy
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

mu = 10 # Mean of sample !!! Make sure your data is positive for the lognormal example 
sigma = 1.5 # Standard deviation of sample
N = 2000 # Number of samples

norm_dist = scipy.stats.norm(loc=mu, scale=sigma) # Create Random Process
x = norm_dist.rvs(size=N) # Generate samples

# Fit normal
fitting_params = scipy.stats.norm.fit(x)
norm_dist_fitted = scipy.stats.norm(*fitting_params)
t = np.linspace(np.min(x), np.max(x), 100)

# Plot normals
f, ax = plt.subplots(1, sharex='col', figsize=(10, 5))
sns.distplot(x, ax=ax, norm_hist=True, kde=False, label='Data X~N(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma))
ax.plot(t, norm_dist_fitted.pdf(t), lw=2, color='r',
        label='Fitted Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist_fitted.mean(), norm_dist_fitted.std()))
ax.plot(t, norm_dist.pdf(t), lw=2, color='g', ls=':',
        label='Original Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist.mean(), norm_dist.std()))
ax.legend(loc='lower right')
plt.show()


# The lognormal model fits to a variable whose log is normal
# We create our variable whose log is normal 'exponenciating' the previous variable

x_exp = np.exp(x)
mu_exp = np.exp(mu)
sigma_exp = np.exp(sigma)

fitting_params_lognormal = scipy.stats.lognorm.fit(x_exp, floc=0, scale=mu_exp)
lognorm_dist_fitted = scipy.stats.lognorm(*fitting_params_lognormal)
t = np.linspace(np.min(x_exp), np.max(x_exp), 100)

# Here is the magic I was looking for a long long time
lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))

# The trick is to understand these two things:
# 1. If the EXP of a variable is NORMAL with MU and STD -> EXP(X) ~ scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))
# 2. If your variable (x) HAS THE FORM of a LOGNORMAL, the model will be scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX)
# with:
#    - muX = np.mean(np.log(x))
#    - sigmaX = np.std(np.log(x))


# Plot lognormals
f, ax = plt.subplots(1, sharex='col', figsize=(10, 5))
sns.distplot(x_exp, ax=ax, norm_hist=True, kde=False,
             label='Data exp(X)~N(mu={0:.1f}, sigma={1:.1f})\n X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma))
ax.plot(t, lognorm_dist_fitted.pdf(t), lw=2, color='r',
        label='Fitted Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist_fitted.mean(), lognorm_dist_fitted.std()))
ax.plot(t, lognorm_dist.pdf(t), lw=2, color='g', ls=':',
        label='Original Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist.mean(), lognorm_dist.std()))
ax.legend(loc='lower right')
plt.show()
Agglutinative answered 9/2, 2018 at 21:49 Comment(2)
Please provide a bit of an explanation to accompany your answer.Hospitalization
@CollinM.Barrett Done! :D I hope it is ok :DAgglutinative
D
0

First, the loc is not a simple linear shift of the distribution, in fact, the loc has its own statistics meaning, it means samples subtract the loc will get a "standardized" lognormal, whose low bound is zero, this is quite important.

Hence, when you specified the "loc" or "floc", you actually imposed a very strong hypnosis, that you assume those samples have a lower bound, and the lower bound is "exactly" the "loc" value. So the scipy used different algorithms to fit, i.e: if you provide the loc information, then scipy will adopt a maximum likelihood approach to calculate the fitting parameters, if not it will use numerical solver.

Also, you can read the code: in scipy package stats/_continuous_distns.py line: 3889. As follow:

    def fit(self, data, *args, **kwds):
        floc = kwds.get('floc', None)
        if floc is None:
            # loc is not fixed.  Use the default fit method.
            return super(lognorm_gen, self).fit(data, *args, **kwds)

        f0 = (kwds.get('f0', None) or kwds.get('fs', None) or
              kwds.get('fix_s', None))
        fscale = kwds.get('fscale', None)

        if len(args) > 1:
            raise TypeError("Too many input arguments.")
        for name in ['f0', 'fs', 'fix_s', 'floc', 'fscale', 'loc', 'scale',
                     'optimizer']:
            kwds.pop(name, None)
        if kwds:
            raise TypeError("Unknown arguments: %s." % kwds)

        # Special case: loc is fixed.  Use the maximum likelihood formulas
        # instead of the numerical solver.

Furthermore, someone from R community might wonder why the output from python is different from R. Actually, I don't agree to use R as a "reference", it is just a software, different software has different flavors of algorithms.

For example, the output from R as follows is not an error, and Python or other software like Fortran used totally different algorithms:

round(3.5)
[1] 4
round(2.5)
[1] 2
Downswing answered 3/12, 2019 at 15:22 Comment(0)
C
-1

Something that helped me was to consider the location and scale as parametrisation.

Instead of use x as in the standard log-normal distribution, you change to x' = (x-location)/scale

Them the probability density function F(x')=(1/scale)F((x-location)/scale))

More information in the link https://en.wikipedia.org/wiki/Location%E2%80%93scale_family

Colorific answered 2/5, 2022 at 20:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.