If I have a random number Z
that is defined as the sum of two other random numbers, X
and Y
, then the probability distribution of Z
is the convolution of the probability distributions for X
and Y
. Convolution is basically to an integral of the product of the distribution functions. Often there is no analytic solution to the integral in the convolution, so it must be calculated with a basic quadrature algorithm. In pseudocode:
prob_z(z) = integrate(lambda t: prob_x(t) * prob_y(z-t), -inf, inf)
For a concrete example, the sum Z
of a normally distributed variable X
and a log normally distributed variable Y
can be calculated with the following Python/Scipy code:
from scipy.integrate import quad
from scipy.stats import norm, lognorm
from scipy import log
prob_x = lambda x: norm.pdf(x, 0, 1) # N(mu=0, sigma=1)
prob_y = lambda y: lognorm.pdf(y, 0.1, scale=10) # LogN(mu=log(10), sigma=0.1)
def prob_z(z):
return quad(lambda t: prob_x(t)*prob_y(z-t), -inf, inf)
Now I want to calculate the log probability. The naive solution is to simple do:
def log_prob_z(z):
return log(prob_z(z))
However, this is numerically unstable. After about 39 standard deviations, probability distributions are numerically 0.0, so even if the log probability has some finite value, it can't be calculated outside this by simply taking the log of the probability. Compare norm.pdf(39, 1, 0)
which is 0.0 to norm.logpdf(39, 1, 0)
which is about -761. Clearly, Scipy doesn't compute logpdf
as log(pdf)
—it finds some other way—because otherwise it would return -inf
, an inferior response. In the same way, I want to find another way for my problem.
(You may wonder why I care about the log likehood of values so far from the mean. The answer is parameter fitting. Fitting algorithms can get closer when the log likelihood is some hugely negative number, but nothing can be done when it is -inf
or nan
.)
The question is: does anyone know how I can rearrange log(quad(...))
so I don't compute quad(...)
and thereby avoiding creating a 0.0 in the log?
prob_z
, which has a long tail. Ultimately, it will suffer from the same problem as any probability distribution—the log probability should have a finite value, but the probability will be floating point zero and therefore unsuitable for calculating the log. – Arbitral