Fitting negative binomial in python
Asked Answered
S

4

11

In scipy there is no support for fitting a negative binomial distribution using data (maybe due to the fact that the negative binomial in scipy is only discrete).

For a normal distribution I would just do:

from scipy.stats import norm
param = norm.fit(samp)

Is there something similar 'ready to use' function in any other library?

Serriform answered 22/5, 2014 at 20:31 Comment(2)
Have you found the solution to this question?Lavona
Please see the answer from @Eran. statsmodels.discrete.discrete_model.NegativeBinomialRacy
H
7

Not only because it is discrete, also because maximum likelihood fit to negative binomial can be quite involving, especially with an additional location parameter. That would be the reason why .fit() method is not provided for it (and other discrete distributions in Scipy), here is an example:

In [163]:

import scipy.stats as ss
import scipy.optimize as so
In [164]:
#define a likelihood function
def likelihood_f(P, x, neg=1):
    n=np.round(P[0]) #by definition, it should be an integer 
    p=P[1]
    loc=np.round(P[2])
    return neg*(np.log(ss.nbinom.pmf(x, n, p, loc))).sum()
In [165]:
#generate a random variable
X=ss.nbinom.rvs(n=100, p=0.4, loc=0, size=1000)
In [166]:
#The likelihood
likelihood_f([100,0.4,0], X)
Out[166]:
-4400.3696690513316
In [167]:
#A simple fit, the fit is not good and the parameter estimate is way off
result=so.fmin(likelihood_f, [50, 1, 1], args=(X,-1), full_output=True, disp=False)
P1=result[0]
(result[1], result[0])
Out[167]:
(4418.599495886474, array([ 59.61196161,   0.28650831,   1.15141838]))
In [168]:
#Try a different set of start paramters, the fit is still not good and the parameter estimate is still way off
result=so.fmin(likelihood_f, [50, 0.5, 0], args=(X,-1), full_output=True, disp=False)
P1=result[0]
(result[1], result[0])
Out[168]:
(4417.1495981801972,
 array([  6.24809397e+01,   2.91877405e-01,   6.63343536e-04]))
In [169]:
#In this case we need a loop to get it right
result=[]
for i in range(40, 120): #in fact (80, 120) should probably be enough
    _=so.fmin(likelihood_f, [i, 0.5, 0], args=(X,-1), full_output=True, disp=False)
    result.append((_[1], _[0]))
In [170]:
#get the MLE
P2=sorted(result, key=lambda x: x[0])[0][1]
sorted(result, key=lambda x: x[0])[0]
Out[170]:
(4399.780263084549,
 array([  9.37289361e+01,   3.84587087e-01,   3.36856705e-04]))
In [171]:
#Which one is visually better?
plt.hist(X, bins=20, normed=True)
plt.plot(range(260), ss.nbinom.pmf(range(260), np.round(P1[0]), P1[1], np.round(P1[2])), 'g-')
plt.plot(range(260), ss.nbinom.pmf(range(260), np.round(P2[0]), P2[1], np.round(P2[2])), 'r-')
Out[171]:
[<matplotlib.lines.Line2D at 0x109776c10>]

enter image description here

Hartley answered 23/5, 2014 at 16:19 Comment(1)
I realized it is a very ad-hoc solution. The optimization function does not work alwaysSerriform
A
3

I know this thread is quite old, but current readers may want to look at this repo which is made for this purpose: https://github.com/gokceneraslan/fit_nbinom

There's also an implementation here, though part of a larger package: https://github.com/ernstlab/ChromTime/blob/master/optimize.py

Autobiography answered 1/8, 2018 at 17:42 Comment(0)
C
2

I stumbled across this thread, and found an answer for anyone else wondering.

If you simply need the n, p parameterisation used by scipy.stats.nbinom you can convert the mean and variance estimates:

mu = np.mean(sample)
sigma_sqr = np.var(sample)

n = mu**2 / (sigma_sqr - mu)
p = mu / sigma_sqr

If you the dispersionparameter you can use a negative binomial regression model from statsmodels with just an interaction term. This will find the dispersionparameter alpha using MLE.

# Data processing
import pandas as pd
import numpy as np

# Analysis models
import statsmodels.formula.api as smf
from scipy.stats import nbinom


def convert_params(mu, alpha):
    """
    Convert mean/dispersion parameterization of a negative binomial to the ones scipy supports

    Parameters
    ----------
    mu : float
       Mean of NB distribution.
    alpha : float
       Overdispersion parameter used for variance calculation.

    See https://en.wikipedia.org/wiki/Negative_binomial_distribution#Alternative_formulations
    """
    var = mu + alpha * mu ** 2
    p = mu / var
    r = mu ** 2 / (var - mu)
    return r, p


# Generate sample data
n = 2
p = 0.9
sample = nbinom.rvs(n=n, p=p, size=10000)


# Estimate parameters
## Mean estimates expectation parameter for negative binomial distribution
mu = np.mean(sample) 

## Dispersion parameter from nb model with only interaction term
nbfit = smf.negativebinomial("nbdata ~ 1", data=pd.DataFrame({"nbdata": sample})).fit()

alpha = nbfit.params[1]  # Dispersion parameter


# Convert parameters to n, p parameterization
n_est, p_est = convert_params(mu, alpha)


# Check that estimates are close to the true values:
print("""
                   {:<3}   {:<3}
True parameters:  {:<3}   {:<3}
Estimates      :  {:<3}   {:<3}""".format('n', 'p', n, p,
                                  np.round(n_est, 2), np.round(p_est, 2)))


Conroy answered 3/11, 2022 at 16:9 Comment(1)
It appears that in statsmodels>=0.14.0 the NegativeBinomial model includes method that returns a scipy.stats.nbinom distribution object without any custom additional code statsmodels.org/stable/generated/…Operand

© 2022 - 2024 — McMap. All rights reserved.