Calculating Probability of a Random Variable in a Distribution in Python
Asked Answered
B

1

8

Given a mean and standard-deviation defining a normal distribution, how would you calculate the following probabilities in pure-Python (i.e. no Numpy/Scipy or other packages not in the standard library)?

  1. The probability of a random variable r where r < x or r <= x.
  2. The probability of a random variable r where r > x or r >= x.
  3. The probability of a random variable r where x > r > y.

I've found some libraries, like Pgnumerics, that provide functions for calculating these, but the underlying math is unclear to me.

Edit: To show this isn't homework, posted below is my working code for Python<=2.6, albeit I'm not sure if it handles the boundary conditions correctly.

from math import *
import unittest

def erfcc(x):
    """
    Complementary error function.
    """
    z = abs(x)
    t = 1. / (1. + 0.5*z)
    r = t * exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+
        t*(.09678418+t*(-.18628806+t*(.27886807+
        t*(-1.13520398+t*(1.48851587+t*(-.82215223+
        t*.17087277)))))))))
    if (x >= 0.):
        return r
    else:
        return 2. - r

def normcdf(x, mu, sigma):
    t = x-mu;
    y = 0.5*erfcc(-t/(sigma*sqrt(2.0)));
    if y>1.0:
        y = 1.0;
    return y

def normpdf(x, mu, sigma):
    u = (x-mu)/abs(sigma)
    y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2)
    return y

def normdist(x, mu, sigma, f):
    if f:
        y = normcdf(x,mu,sigma)
    else:
        y = normpdf(x,mu,sigma)
    return y

def normrange(x1, x2, mu, sigma, f=True):
    """
    Calculates probability of random variable falling between two points.
    """
    p1 = normdist(x1, mu, sigma, f)
    p2 = normdist(x2, mu, sigma, f)
    return abs(p1-p2)
Betake answered 25/2, 2012 at 21:29 Comment(1)
That's what the cumulative distribution function for the distribution gives you. The article you links to gives this for normal distributionsHistoried
D
10

All these are very similar: If you can compute #1 using a function cdf(x), then the solution to #2 is simply 1 - cdf(x), and for #3 it's cdf(x) - cdf(y).

Since Python includes the (gauss) error function built in since version 2.7 you can do this by calculating the cdf of the normal distribution using the equation from the article you linked to:

import math
print 0.5 * (1 + math.erf((x - mean)/math.sqrt(2 * standard_dev**2)))

where mean is the mean and standard_dev is the standard deviation.

Some notes since what you asked seemed relatively straightforward given the information in the article:

  • CDF of a random variable (say X) is the probability that X lies between -infinity and some limit, say x (lower case). CDF is the integral of the pdf for continuous distributions. The cdf is exactly what you described for #1, you want some normally distributed RV to be between -infinity and x (<= x).
  • < and <= as well as > and >= are same for continuous random variables as the probability that the rv is any single point is 0. So whether or not x itself is included doesn't actually matter when calculating the probabilities for continuous distributions.
  • Sum of probabilities is 1, if its not < x then it's >= x so if you have the cdf(x). then 1 - cdf(x) is the probability that the random variable X >= x. Since >= is equivalent for continuous random variables to >, this is also the probability X > x.
Devastate answered 25/2, 2012 at 21:40 Comment(5)
How are the bounds interpreted? You say cdf(x) solves #1, but I have two separate cases for #1. Less than and less than or equal to. Which does cdf(x) solve, and how would I find the other case?Betake
Hi, for normal distribution which is continuous, less than and less than equal to are equivalent so this is just one case. I've added some notes.Devastate
1 - cdf(x) could be expressed via math.erfc(). It might improve precision for cdf(x) near 1.Grozny
the notes placed under, beat my university notes. small, clean, elegant.Cheesecloth
The question was referring to normal distribution and the solution uses error function. If you'd like to see the relationship between CDF of normal distribution and the error function, please see [link] johndcook.com/erf_and_normal_cdf.pdf. The CDF of normal distribution is closely related to the error function, but they are not the same.Whole

© 2022 - 2024 — McMap. All rights reserved.