generating poisson variables in c++
Asked Answered
C

5

5

I implemented this function to generate a poisson random variable

typedef long unsigned int luint;
luint poisson(luint lambda) {
    double L = exp(-double(lambda));
    luint k = 0;
    double p = 1;
    do {
        k++;
        p *= mrand.rand();
    } while( p > L);
    return (k-1);
}

where mrand is the MersenneTwister random number generator. I find that, as I increase lambda, the expected distribution is going to be wrong, with a mean that saturates at around 750. Is it due to numerical approximations or did I make any mistakes?

Colossal answered 14/4, 2011 at 2:18 Comment(3)
IIRC, a poisson variable has an exponential distribution. Therefore this is a precise duplicate of #2107003. But even if I'm mistaken, the method given there should work.Mayonnaise
@MSalters: The Poisson distribution is discrete - it takes only integer values. The exponential distribution is continuous. So they are not the same (although they are related).Hodges
Right, from Wikipedia: "If the number of arrivals in a given time interval [0,t] follows the Poisson distribution, with mean = λt, then the lengths of the inter-arrival times follow the Exponential distribution, with mean 1 / λ.". That's an effective transformation between the two, structurally similar to the algorithm I proposed below.Mayonnaise
M
1

From another question I asked earlier, it seems you could also approximate poisson(750) as poisson(375) + poisson(375).

Mayonnaise answered 14/4, 2011 at 12:6 Comment(0)
H
3

If you go the "existing library" route, your compiler may already support the C++11 std::random package. Here is how you use it:

#include <random>
#include <ctime>
#include <iostream>

std::mt19937 mrand(std::time(0));  // seed however you want

typedef long unsigned int luint;

luint poisson(luint lambda)
{
    std::poisson_distribution<luint> d(lambda);
    return d(mrand);
}

int main()
{
    std::cout << poisson(750) << '\n';
    std::poisson_distribution<luint> d(750);
    std::cout << d(mrand) << '\n';
    std::cout << d(mrand) << '\n';
}

I've used it two ways above:

  1. I tried to imitate your existing interface.

  2. If you create a std::poisson_distribution with a mean, it is more efficient to use that distribution over and over for the same mean (as done in main()).

Here is sample output for me:

751
730
779
Halmahera answered 14/4, 2011 at 13:45 Comment(0)
H
2

exp(-750) is a very small number, very close to the smallest possible double, so your issue is numerical. In any case, your complexity will be linear in lambda, so the algorithm isn't very efficient for high lambda. Unless you have a great reason to code this yourself, using an existing library implementation probably makes sense, as these numerical algorithms tend to be touchy precisely for the precision issues you're encountering.

Hautesavoie answered 14/4, 2011 at 2:39 Comment(1)
I guess I will use the normal approximation, since in my case lambda is always a big numer.Colossal
M
2

Since you only use L in the expression (p>L), you're essentially testing for (log(p) > -lambda). That's not a very helpful transformation. Sure, you don't need exp(-750) anymore, but you'll just overflow p instead.

Now, p is just Π(mrand.rand()), and log(p) is log(Π(mrand.rand())) is Σ(log(mrand.rand()). That gives you the necessary transformation:

double logp = 0;
do {
    k++;
    logp += log(mrand.rand());
} while( logp > -lambda);

double has only 11 bits of exponent, but a 52 bits mantissa. Therefore this is a massive increase in numerical stability. The price paid is that you need a log on every iteration, instead of a single exp up front.

Mayonnaise answered 14/4, 2011 at 9:21 Comment(0)
M
1

From another question I asked earlier, it seems you could also approximate poisson(750) as poisson(375) + poisson(375).

Mayonnaise answered 14/4, 2011 at 12:6 Comment(0)
H
0

In situations like these, you don't need to invoke the random number generator more than once. All you need is a table of cumulative probabilities:

double c[k] = // the probability that X <= k (k = 0,...)

Then generate a random number 0 <= r < 1, and take the first integer X such that c[X] > r. You can find this X with a binary search.

To generate this table, we need the individual probabilities

p[k] = lambda^k / (k! e^lambda) // // the probability that X = k

If lambda is large, this becomes wildly inaccurate, as you have found. But we can use a trick here: start at (or near) the largest value, with k = floor[lambda], and pretend for the moment that p[k] is equal to 1. Then calculate p[i] for i > k using the recurrence relation

p[i+1] = (p[i]*lambda) / (i+1)

and for i < k using

p[i-1] = (p[i]*i)/lambda

This ensures that the largest probabilities have the greatest possible precision.

Now just calculate c[i] using c[i+1] = c[i] + p[i+1], up to the point where c[i+1] is the same as c[i]. Then you can normalise the array by dividing by this limiting value c[i]; or you can leave the array as it is, and use a random number 0 <= r < c[i].

See: http://en.wikipedia.org/wiki/Inverse_transform_sampling

Hodges answered 14/4, 2011 at 11:50 Comment(2)
Couldn't you store log(p[k]) instead? That's just (k log(λ)) / (λ * log(k!)), and calculating that isn't hard (see en.wikipedia.org/wiki/Factorial#Rate_of_growth for log(k!))Mayonnaise
That's a backward step. The precision of log(k!) degrades as k increases, whereas we want the most accurate values to be around the mean, where k ~ lambda. Also, there's no need for log or exp here at all.Hodges

© 2022 - 2024 — McMap. All rights reserved.