Analytic Highest Density Interval in Python (preferably for Beta distributions)
Asked Answered
A

1

2

I was wondering if anybody knows of a reliable and fast analytic HDI calculation, preferably for beta functions.

The definition of the HDI is in this question called "Highest Posterior Density Region".

I am looking for a function that has the following I/O:

Input

  • credMass - credible interval mass (e.g, 0.95 for 95% credible interval)
  • a - shape parameter (e.g, number of HEADS coin tosses)
  • b - shape parameter (e.g, number of TAILS coin tosses)

output

  • ci_min - minimum bound of mass (value between 0. to ci_max)
  • ci_max - maximum bound of mass (value between ci_min to 1.)

One way to go about it is to speed up this script that I found in the same said question (adapted from R to Python) and taken from the book Doing bayesian data analysis by John K. Kruschke). I have used this solution, which is quite reliable, but it's a bit too slow for multiple calls. A speedup by 100X or even 10X would be very helpful!

from scipy.optimize import fmin
from scipy.stats import *

def HDIofICDF(dist_name, credMass=0.95, **args):
    # freeze distribution with given arguments
    distri = dist_name(**args)
    # initial guess for HDIlowTailPr
    incredMass =  1.0 - credMass

    def intervalWidth(lowTailPr):
        return distri.ppf(credMass + lowTailPr) - distri.ppf(lowTailPr)

    # find lowTailPr that minimizes intervalWidth
    HDIlowTailPr = fmin(intervalWidth, incredMass, ftol=1e-8, disp=False)[0]
    # return interval as array([low, high])
    return distri.ppf([HDIlowTailPr, credMass + HDIlowTailPr])

usage

print HDIofICDF(beta, credMass=0.95, a=5, b=4)

Word of caution! Some solutions confuse this HDI with the Equal Tail Interval solution (which the said question calls "Central Credible Region") which is far easier to compute, but does not answer the same question. (E.g, see Kruschke's Why HDI and not equal-tailed interval?)

Also this question does not concern the MCMC approaches that I have seen in PyMC3 (pymc3.stats.hpd(a), where a is a random variable sample), but rather regards an analytical solution.

Thank you!

Angellaangelle answered 16/6, 2021 at 18:39 Comment(2)
aakinshin.net/posts/beta-hdi the article provides everything you need to get an analytical solution for a beta distribution. If I had more time, I'd write an answer with the code needed to run it. As I don't, I'll leave this here for others with more time than I.Rittenhouse
I was incorrect. The link I referenced does not have all the info necessary, because the interval they recommend has a fix width, not a fixed mass. However, I'm confident if you know how to find the 2.5 and 5 percentiles, you can make the necessary changes to their method.Rittenhouse
F
1

Not analytical, but just in case you still find this useful. Instead of evaluating the ppf (that can be very slow for many distributions) you can find an approximate solution by evaluating the pdf, something like this.

pdf = dist.pdf(x_vals)
pdf = pdf / pdf.sum()
idx = np.argsort(pdf)[::-1]
mass_cum = 0
indices = []
for i in idx:
    mass_cum += pdf[i]
    indices.append(i)
    if mass_cum >= mass:
        break
return x_vals[np.sort(indices)[[0, -1]]]

Where mass is a number between 0 and 1. And x_vals is a vector of equally spaced values in the support of the distribution. So for the beta x_vals = np.linspace(0, 1, size) should do the work. You can control the tradeoff of accuracy and speed by setting a proper value of size for your problem. How much faster this solution is compared with the other one (using the ppf and optimization) will depend on the details of the SciPy's ppf and pdf method (that will be different for different distributions). And could even depend on the parameters of the distribution.

Feriga answered 2/8, 2022 at 15:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.