Analytic Highest Density Interval in Python (preferably for Beta distributions)

I was wondering if anybody knows of a reliable and fast analytic HDI calculation, preferably for beta functions.

The definition of the HDI is in this question called "Highest Posterior Density Region".

I am looking for a function that has the following I/O:

Input

credMass - credible interval mass (e.g, 0.95 for 95% credible interval)
a - shape parameter (e.g, number of HEADS coin tosses)
b - shape parameter (e.g, number of TAILS coin tosses)

output

ci_min - minimum bound of mass (value between 0. to ci_max)
ci_max - maximum bound of mass (value between ci_min to 1.)

One way to go about it is to speed up this script that I found in the same said question (adapted from R to Python) and taken from the book Doing bayesian data analysis by John K. Kruschke). I have used this solution, which is quite reliable, but it's a bit too slow for multiple calls. A speedup by 100X or even 10X would be very helpful!

from scipy.optimize import fmin
from scipy.stats import *

def HDIofICDF(dist_name, credMass=0.95, **args):
    # freeze distribution with given arguments
    distri = dist_name(**args)
    # initial guess for HDIlowTailPr
    incredMass =  1.0 - credMass

    def intervalWidth(lowTailPr):
        return distri.ppf(credMass + lowTailPr) - distri.ppf(lowTailPr)

    # find lowTailPr that minimizes intervalWidth
    HDIlowTailPr = fmin(intervalWidth, incredMass, ftol=1e-8, disp=False)[0]
    # return interval as array([low, high])
    return distri.ppf([HDIlowTailPr, credMass + HDIlowTailPr])

usage

print HDIofICDF(beta, credMass=0.95, a=5, b=4)

Word of caution! Some solutions confuse this HDI with the Equal Tail Interval solution (which the said question calls "Central Credible Region") which is far easier to compute, but does not answer the same question. (E.g, see Kruschke's Why HDI and not equal-tailed interval?)

Also this question does not concern the MCMC approaches that I have seen in PyMC3 (pymc3.stats.hpd(a), where a is a random variable sample), but rather regards an analytical solution.

Thank you!

pdf = dist.pdf(x_vals) pdf = pdf / pdf.sum() idx = np.argsort(pdf)[::-1] mass_cum = 0 indices = [] for i in idx: mass_cum += pdf[i] indices.append(i) if mass_cum >= mass: break return x_vals[np.sort(indices)[[0, -1]]]

Recommended topics

Hot tags