Errors while trying to fit gamma distribution with R fitdistr{MASS}

Asked 12/4, 2013 at 15:3 Answered 4/11, 2015 at 17:50

I have a problem with fitdistr{MASS} function in R. I have this vector:

a <- c(26,73,84,115,123,132,159,207,240,241,254,268,272,282,300,302,329,346,359,367,375,378, 384,452,475,495,503,531,543,563,594,609,671,687,691,716,757,821,829,885,893,968,1053,1081,1083,1150,1205,1262,1270,1351,1385,1498,1546,1565,1635,1671,1706,1820,1829,1855,1873,1914,2030,2066,2240,2413,2421,2521,2586,2727,2797,2850,2989,3110,3166,3383,3443,3512,3515,3531,4068,4527,5006,5065,5481,6046,7003,7245,7477,8738,9197,16370,17605,25318,58524)

and I want to fit a gamma distribution to the data with a command:

fitted.gamma <- fitdistr(a, "gamma")

but I have such error:

Error in optim(x = c(26, 73, 84, 115, 123, 132, 159, 207, 240, 241, 254,  : 
non-finite finite-difference value [1]
In addition: Warning messages:
1: In densfun(x, parm[1], parm[2], ...) : NaNs produced
2: In densfun(x, parm[1], parm[2], ...) : NaNs produced
3: In densfun(x, parm[1], parm[2], ...) : NaNs produced
4: In densfun(x, parm[1], parm[2], ...) : NaNs produced

So I tried with initializing the parameters:

(fitted.gamma <- fitdistr(a, "gamma", start=list(1,1)))

The object fitted.gamma is created but when printed, creates an error:

Error in dn[[2L]] : subscript out of bounds

Do you know what is happening or maybe know some other R functions to fit univariate distributions by MLE?

Thanks in advance for any help or response.

Kuba

Corticosteroid answered 12/4, 2013 at 15:3 Comment(0)

Always plot your stuff first, you scaling is far offfffffff.

library(MASS)
a <- c(26,73,84,115,123,132,159,207,240,241,254,268,272,282,300,302,329,346,359,367,375,378, 384,452,475,495,503,531,543,563,594,609,671,687,691,716,757,821,829,885,893,968,1053,1081,1083,1150,1205,1262,1270,1351,1385,1498,1546,1565,1635,1671,1706,1820,1829,1855,1873,1914,2030,2066,2240,2413,2421,2521,2586,2727,2797,2850,2989,3110,3166,3383,3443,3512,3515,3531,4068,4527,5006,5065,5481,6046,7003,7245,7477,8738,9197,16370,17605,25318,58524)
## Ooops, rater wide
plot(hist(a))
fitdistr(a/10000,"gamma") # gives warnings
# No warnings
fitted.gamma <- fitdistr(a/10000, dgamma,  start=list(shape = 1, rate = 0.1),lower=0.001)

Now you can decide what to do with the scaling

Osteophyte answered 12/4, 2013 at 15:14 Comment(2)

Thanks for your reply. I see that adding "lower" argument with scaling made the trick. Does that mean that while optimizing gamma parameters took at some point negative values? When it comes to scaling, why is it necessary to scale the values (rate parameter is to low)? Kuba – Corticosteroid 12/4, 2013 at 15:49

Yes, during gradient optimization we easily run into bad gradient regions for some samples. Gamma might not be the right distribution, just try to plot a few examples. However, log(a) looks almost normal... – Osteophyte 12/4, 2013 at 16:1

For data that clearly fits the gamma distribution, but is on the wrong scale (i.e., as if it had been multiplied/divided by a large number), here's an alternative approach to fitting the gamma distribution:

fitgamma <- function(x) {
  # Equivalent to `MASS::fitdistr(x, densfun = "gamma")`, where x are first rescaled to 
  # the appropriate scale for a gamma distribution. Useful for fitting the gamma distribution to 
  # data which, when multiplied by a constant, follows this distribution
  if (!requireNamespace("MASS")) stop("Requires MASS package.")

  fit <- glm(formula = x ~ 1, family = Gamma)
  out <- MASS::fitdistr(x * coef(fit), "gamma")
  out$scaling_multiplier <- unname(coef(fit))
  out
}

Usage:

set.seed(40)
test <- rgamma(n = 100, shape = 2, rate = 2)*50000
fitdistr(test, "gamma") # fails
dens_fit <- fitgamma(test) # successs
curve(dgamma(x, 2, 2), to = 5) # true distribution
curve(dgamma(x, dens_fit$estimate['shape'], dens_fit$estimate['rate']), add=TRUE, col=2) # best guess
lines(density(test * dens_fit$scaling_multiplier), col = 3)

Onshore answered 4/11, 2015 at 17:50 Comment(0)

Recommended topics

Hot tags