I was wondering if anybody knows of a reliable and fast analytic HDI calculation, preferably for beta functions.
The definition of the HDI is in this question called "Highest Posterior Density Region".
I am looking for a function that has the following I/O:
Input
credMass
- credible interval mass (e.g,0.95
for 95% credible interval)a
- shape parameter (e.g, number of HEADS coin tosses)b
- shape parameter (e.g, number of TAILS coin tosses)
output
ci_min
- minimum bound of mass (value between0.
toci_max
)ci_max
- maximum bound of mass (value betweenci_min
to1.
)
One way to go about it is to speed up this script that I found in the same said question (adapted from R to Python) and taken from the book Doing bayesian data analysis by John K. Kruschke). I have used this solution, which is quite reliable, but it's a bit too slow for multiple calls. A speedup by 100X or even 10X would be very helpful!
from scipy.optimize import fmin
from scipy.stats import *
def HDIofICDF(dist_name, credMass=0.95, **args):
# freeze distribution with given arguments
distri = dist_name(**args)
# initial guess for HDIlowTailPr
incredMass = 1.0 - credMass
def intervalWidth(lowTailPr):
return distri.ppf(credMass + lowTailPr) - distri.ppf(lowTailPr)
# find lowTailPr that minimizes intervalWidth
HDIlowTailPr = fmin(intervalWidth, incredMass, ftol=1e-8, disp=False)[0]
# return interval as array([low, high])
return distri.ppf([HDIlowTailPr, credMass + HDIlowTailPr])
usage
print HDIofICDF(beta, credMass=0.95, a=5, b=4)
Word of caution! Some solutions confuse this HDI with the Equal Tail Interval solution (which the said question calls "Central Credible Region") which is far easier to compute, but does not answer the same question. (E.g, see Kruschke's Why HDI and not equal-tailed interval?)
Also this question does not concern the MCMC approaches that I have seen in PyMC3 (pymc3.stats.hpd(a)
, where a
is a random variable sample), but rather regards an analytical solution.
Thank you!