Finding alpha and beta of beta-binomial distribution with scipy.optimize and loglikelihood
Asked Answered
F

1

18

A distribution is beta-binomial if p, the probability of success, in a binomial distribution has a beta distribution with shape parameters α > 0 and β > 0. The shape parameters define the probability of success. I want to find the values for α and β that best describe my data from the perspective of a beta-binomial distribution. My dataset players consist of data about the number of hits (H), the number of at-bats (AB) and the conversion (H / AB) of a lot of baseball players. I estimate the PDF with the help of the answer of JulienD in Beta Binomial Function in Python

from scipy.special import beta
from scipy.misc import comb

pdf = comb(n, k) * beta(k + a, n - k + b) / beta(a, b)

Next, I write a loglikelihood function that we will minimize.

def loglike_betabinom(params, *args):
   """
   Negative log likelihood function for betabinomial distribution
   :param params: list for parameters to be fitted.
   :param args:  2-element array containing the sample data.
   :return: negative log-likelihood to be minimized.
   """

   a, b = params[0], params[1]
   k = args[0] # the conversion rate
   n = args[1] # the number of at-bats (AE)

   pdf = comb(n, k) * beta(k + a, n - k + b) / beta(a, b)

   return -1 * np.log(pdf).sum()   

Now, I want to write a function that minimizes loglike_betabinom

 from scipy.optimize import minimize
 init_params = [1, 10]
 res = minimize(loglike_betabinom, x0=init_params,
                args=(players['H'] / players['AB'], players['AB']),
                bounds=bounds,
                method='L-BFGS-B',
                options={'disp': True, 'maxiter': 250})
 print(res.x)

The result is [-6.04544138 2.03984464], which implies that α is negative which is not possible. I based my script on the following R-snippet. They get [101.359, 287.318]..

 ll <- function(alpha, beta) { 
    x <- career_filtered$H
    total <- career_filtered$AB
    -sum(VGAM::dbetabinom.ab(x, total, alpha, beta, log=True))
 }

 m <- mle(ll, start = list(alpha = 1, beta = 10), 
 method = "L-BFGS-B", lower = c(0.0001, 0.1))

 ab <- coef(m)

Can someone tell me what I am doing wrong? Help is much appreciated!!

Favata answered 3/2, 2019 at 16:46 Comment(6)
How are you minimizing your loss function now? Did you write your own method, or are you using something from a package? Either way, what are the details?Pueblo
I use from scipy.optimize import minimizeFavata
@Favata What is players? Can you share that data please? That is the missing part for me to test my answer.Bierce
Where do you specify your bounds that you are passing in to minimize?Pueblo
I removed the bounds-partFavata
drive.google.com/open?id=1vFYzyJcPbDYRLP_AaWGQ5QFw9KCAcUgPFavata
S
9

One thing to pay attention to is that comb(n, k) in your log-likelihood might not be well-behaved numerically for the values of n and k in your dataset. You can verify this by applying comb to your data and see if infs appear.

One way to amend things could be to rewrite the negative log-likelihood as suggested in https://mcmap.net/q/742199/-beta-binomial-function-in-python, i.e. as a function of logarithms of Gamma functions as in

from scipy.special import gammaln
import numpy as np

def loglike_betabinom(params, *args):

    a, b = params[0], params[1]
    k = args[0] # the OVERALL conversions
    n = args[1] # the number of at-bats (AE)

    logpdf = gammaln(n+1) + gammaln(k+a) + gammaln(n-k+b) + gammaln(a+b) - \
     (gammaln(k+1) + gammaln(n-k+1) + gammaln(a) + gammaln(b) + gammaln(n+a+b))

    return -np.sum(logpdf) 

You can then minimize the log-likelihood with

from scipy.optimize import minimize

init_params = [1, 10]
# note that I am putting 'H' in the args
res = minimize(loglike_betabinom, x0=init_params,
            args=(players['H'], players['AB']),
            method='L-BFGS-B', options={'disp': True, 'maxiter': 250})
print(res)

and that should give reasonable results.

You could check How to properly fit a beta distribution in python? for inspiration if you want to rework further your code.

Selfanalysis answered 9/2, 2019 at 16:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.