Zero inflated poisson model fails to fit
Asked Answered
G

1

6

Here's the data and set up:

library(fitdistrplus)
library(gamlss)

finalVector <- dput(finalVector)
c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1, 
1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 3, 2, 1, 1, 1, 1, 1, 1, 2, 
2, 1, 4, 2, 3, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
2, 1, 2, 2, 1, 1, 4, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 1, 
2, 1, 1, 4, 2, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1)

countFitPoisson <- fitdist(finalVector,"pois",  method = "mle", lower = 0)
countFitZeroPoisson <- fitdist(finalVector, 'ZIP', start = list( ##mu  = mean of poisson, sigma = prob(x = 0))
                                                           mu = as.numeric(countFitPoisson$estimate), 
                                                            sigma = as.numeric(as.numeric(countFitPoisson$estimate))
                                                          ), method = "mle", lower= 0) 

The first call works successfully. The final says it failed to estimate and I'm not sure why. Thanks!

Edit:

Assuming I did the code correctly (not sure), then the only thing I can think of is that there are not enough zeros to fit the model?

Godhead answered 24/6, 2015 at 18:45 Comment(1)
It seems to work when you pad a few 0's (5) and change the starting parameters to something closer to the mle value (0.7, 0.05).Camel
D
8

Your data is not really zero-inflated, hence fitting the model does not lead to an improvement. Rather than using the fitdistr approach, I'm using glm and extended regression models below. All regression (sub-)models just use a constant (or intercept) without any real regressors, though. For visualization I use rootograms via the countreg package available from R-Forge (which contains the successor functions to the pscl count data regressions).

First, let's look at the Poisson fit:

(mp <- glm(finalVector ~ 1, family = poisson))
## Call:  glm(formula = finalVector ~ 1, family = poisson)
## 
## Coefficients:
## (Intercept)  
##      -0.284  
## 
## Degrees of Freedom: 181 Total (i.e. Null);  181 Residual
## Null Deviance:      200.2 
## Residual Deviance: 200.2        AIC: 418.3

This corresponds to a fitted mean of exp(-0.284), i.e., about 0.753. This fits the data very well if you compare observed and fitted frequencies:

library("countreg")
rootogram(mp)

This shows that the fit for the counts 0, 1, 2 is essentially perfect and there are only small deviations for 3, 4, 5 but these frequencies are extremely low anyway. So judging from this it appears that no extension of the model is necessary.

rootogram

But to formally compare the model with others, one could consider a zero-inflated Poisson (as you tried) a hurdle Poisson or a negative binomial. The zero-inflated model yields:

(mzip <- zeroinfl(finalVector ~ 1 | 1, dist = "poisson"))
## Call:
## zeroinfl(formula = finalVector ~ 1 | 1, dist = "poisson")
## 
## Count model coefficients (poisson with log link):
## (Intercept)  
##     -0.2839  
## 
## Zero-inflation model coefficients (binomial with logit link):
## (Intercept)  
##      -9.151  

Thus, the count mean is essentially identical to before and the probability of zero-inflation is essentially zero (plogis(-9.151) is about 0.01%).

The hurdle model works similarly but can use a zero-censored Poisson model for 0-vs-greater and a truncated Poisson for the positive counts. This is then also nested within the Poisson model and hence a Wald test can be carried easily out:

mhp <- hurdle(finalVector ~ 1 | 1, dist = "poisson", zero.dist = "poisson")
hurdletest(mhp)
## Wald test for hurdle models
## 
## Restrictions:
## count_((Intercept) - zero_(Intercept) = 0
## 
## Model 1: restricted model
## Model 2: finalVector ~ 1 | 1
## 
##   Res.Df Df Chisq Pr(>Chisq)
## 1    181                    
## 2    180  1 0.036     0.8495

This also clearly shows that there are no excess zeros and a simple Poisson model suffices.

As a final check one could also consider a negative binomial model:

(mnb <- glm.nb(finalVector ~ 1))
## Call:  glm.nb(formula = finalVector ~ 1, init.theta = 125.8922776, link = log)
## 
## Coefficients:
## (Intercept)  
##      -0.284  
## 
## Degrees of Freedom: 181 Total (i.e. Null);  181 Residual
## Null Deviance:      199.1 
## Residual Deviance: 199.1        AIC: 420.3

This has again virtually the same mean and a huge theta parameter that is close enough to infinity (= Poisson). Thus, overall the Poisson model is simply sufficient and there is no need for any of the extensions considered. The likelihoods are almost unchanged and the additional parameters (zero-inflation, zero-hurdle, theta-dispersion) do not yield any improvement:

AIC(mp, mzip, mhp, mnb)
##      df      AIC
## mp    1 418.2993
## mzip  2 420.2996
## mhp   2 420.2631
## mnb   2 420.2959
Decrescent answered 25/6, 2015 at 7:30 Comment(2)
I completely agree with your analysis -- but I wonder should the package not return a regular poisson model then rather than crash?Godhead
It does not "crash" but terminates with a sufficiently informative error: sigma becomes 0 but must be between 0 and 1. This indicates a degenerate model and hence an error is in order. The zeroinfl function avoids this by using a logit link so that the probability has to be strictly between 0 and 1. But you still see the degeneration because an intercept of -9.2 on the logit scale is already pretty close to -Infinity. Using reltol = 1e-12 would lead to -11.7 already and lower reltol values also lead to a failure to converge...Decrescent

© 2022 - 2024 — McMap. All rights reserved.