Numerically stable evaluation of log of integral of a function with extremely small values
Asked Answered
A

1

6

If I have a random number Z that is defined as the sum of two other random numbers, X and Y, then the probability distribution of Z is the convolution of the probability distributions for X and Y. Convolution is basically to an integral of the product of the distribution functions. Often there is no analytic solution to the integral in the convolution, so it must be calculated with a basic quadrature algorithm. In pseudocode:

prob_z(z) = integrate(lambda t: prob_x(t) * prob_y(z-t), -inf, inf)

For a concrete example, the sum Z of a normally distributed variable X and a log normally distributed variable Y can be calculated with the following Python/Scipy code:

from scipy.integrate import quad
from scipy.stats import norm, lognorm
from scipy import log

prob_x = lambda x: norm.pdf(x, 0, 1)  # N(mu=0, sigma=1)
prob_y = lambda y: lognorm.pdf(y, 0.1, scale=10)  # LogN(mu=log(10), sigma=0.1)
def prob_z(z):
    return quad(lambda t: prob_x(t)*prob_y(z-t), -inf, inf)

Now I want to calculate the log probability. The naive solution is to simple do:

def log_prob_z(z):
    return log(prob_z(z))

However, this is numerically unstable. After about 39 standard deviations, probability distributions are numerically 0.0, so even if the log probability has some finite value, it can't be calculated outside this by simply taking the log of the probability. Compare norm.pdf(39, 1, 0) which is 0.0 to norm.logpdf(39, 1, 0) which is about -761. Clearly, Scipy doesn't compute logpdf as log(pdf)—it finds some other way—because otherwise it would return -inf, an inferior response. In the same way, I want to find another way for my problem.

(You may wonder why I care about the log likehood of values so far from the mean. The answer is parameter fitting. Fitting algorithms can get closer when the log likelihood is some hugely negative number, but nothing can be done when it is -inf or nan.)

The question is: does anyone know how I can rearrange log(quad(...)) so I don't compute quad(...) and thereby avoiding creating a 0.0 in the log?

Arbitral answered 29/11, 2017 at 21:54 Comment(8)
This might be better suited as a general mathematics question.Bop
If I thought solving the integral analytically was tractable, I would agree. But I really only care about the solution from a numeric perspective. In my experience, numeric questions are best asked here. Though I agree that figuring this out will require a bit of math.Arbitral
A normal distribution is never 0.0 - it extends to infinity and has always a finite value. In the wings the probabilities become too small to represent on a computer in a simple way. Even if you came up with a way to calculate those values, you still would be unable to represent them with standard floating point numbers. I think what you're looking for is not a solution to this particular math problem, but a means of representing floating point numbers with increased precision.Eckman
I'd possibly take a look at SymPy, as it supports arbitrary precision.Bop
@PaulCornelius In the areas where the likelihood is too small to be represented by floating point numbers, the log likelihood can be represented accurately. I am looking for a way to calculate the log likelihood that doesn't involve just naively taking the log of the unrepresentable likelihood.Arbitral
@6'whitemale I edited into the question a clarification that the 39 standard deviations fact is about the normal distribution. It is a little trickier to measure standard deviations for prob_z, which has a long tail. Ultimately, it will suffer from the same problem as any probability distribution—the log probability should have a finite value, but the probability will be floating point zero and therefore unsuitable for calculating the log.Arbitral
@Sebastian More specifically, mpmath supports arbitrary precision. SymPy does symbolic calculations; whatever it does numerically is via calling mpmath under the hood.Alcoran
Side comment: no, there is no magic math trick to somehow apply the logarithm before the integral. The only transformations that play well with integration are linear (and I used one in the 2nd part of my answer), and logarithm is not a linear function.Alcoran
A
6

The problem is that the values of the function you are integrating are too small to be represented in double precision, which is good only until 1e-308 or so.

mpmath to the rescue

When double-precision is not enough for numerical computations, mpmath, a library for arbitrary-precision floating point operations, is called for. It has its own quad routine, but you'll need to implement your pdf functions so they work at mpmath level (otherwise there won't be anything to integrate). There are many built-in functions, including normal pdf, so I'm going to use that for illustration.

Here I convolve two normal pdfs at distance 70 apart with SciPy:

z = 70
p = quad(lambda t: norm.pdf(t, 0, 1)*norm.pdf(z-t, 0, 1), -np.inf, np.inf)[0]

Sadly, p is exactly 0.0.

And here I do the same with mpmath, after import mpmath as mp:

z = 70
p = mp.quad(lambda t: mp.npdf(t, 0, 1)*mp.npdf(z-t, 0, 1), [-mp.inf, mp.inf])

Now p is an mpmath object that prints as 2.95304756048889e-543, well beyond double-precision scale. And its logarithm, mp.log(p), is -1249.22086778731.

SciPy-based alternative: logarithmic offset

If for some reason you can't use mpmath, you can at least try to "normalize" the function by moving its values into the double precision range. Here is an example:

z = 70
offset = 2*norm.logpdf(z/2, 0, 1)
logp = offset + np.log(quad(lambda t: np.exp(norm.logpdf(t, 0, 1) + norm.logpdf(z-t, 0, 1) - offset), -np.inf, np.inf)[0])

Here logp prints -1264.66566393 which is not as good as mpmath result (so we lost some of the function) but it's reasonable. What I did was:

  • calculate the logarithm of the maximal value of the logarithm of our function (this is the variable offset)
  • subtract this offset from the logarithm of pdf; this is the part norm.logpdf(t, 0, 1) + norm.logpdf(z-t, 0, 1) - offset
  • exponentiate the result, since we can't just put logarithm inside integral. Algebraically, this would be the same as product of pdfs times exp(-offset). But numerically, this is a number that is less likely to overflow; indeed, at t = z/2 it is exp(0)=1.
  • integrate normally; take the logarithm, add offset to the logarithm. Algebraically, the result is just the logarithm of the integral we wanted to take.
Alcoran answered 30/11, 2017 at 2:57 Comment(1)
The mpmath version is easy to implement and surprisingly fast. I just went with that. It works are truly colossal ranges. It breaks down at about 1e20 standard deviations for my problem and that's at the default mp precision while relying on Scipy for the log-normal log-pdf.Arbitral

© 2022 - 2024 — McMap. All rights reserved.