Gamma GLM: NaN production and divergence errors
Asked Answered
C

2

6

Intro

I'm trying to construct a GLM that models the quantity (mass) of eggs the specimens of a fish population lays depending on its size and age.

Thus, the variables are:

  • eggW: the total mass of layed eggs, a continuous and positive variable ranging between 300 and 30000.

  • fishW: mass of the fish, continuous and positive, ranging between 3 and 55.

  • age: either 1 or 2 years.

No 0's, no NA's.

After checking and realising assuming a normal distribution was probably not appropriate, I decided to use a Gamma distribution. I chose Gamma basically because the variable was positive and continuous, with increasing variance with higher values and appeared to be skewed, as you can see in the image below.

Frequency distribution of eggW values:enter image description here

fishW vs eggW:enter image description here

The code

myglm <- glm(eggW ~ fishW * age, family=Gamma(link=identity), 
start=c(mean(data$eggW),1,1,1),
maxit=100)

I added the maxit factor after seeing it suggested on a post of this page as a solution to glm.fit: algorithm did not converge error, and it worked.

I chose to work with link=identity because of the more obvious and straightforward interpretation of the results in biological terms rather than using an inverse or log link.

So, the code above results in the next message:

Warning messages:
1: In log(ifelse(y == 0, 1, y/mu)) : NaNs produced
2: step size truncated due to divergence

Importantly, no error warnings are shown if the variable fishW is dropped and only age is kept. No errors are reported if a log link is used.

Questions

If the rationale behind the design of my model is acceptable, I would like to understand why these errors are reported and how to solve or avoid them. In any case, I would appreciate any criticism or suggestions.

Cuspidate answered 21/6, 2017 at 10:1 Comment(3)
Please, could you post the correlation between eggW and fishW ?Concordat
Is that second plot enough?Cuspidate
Gamma GLM with log link is quite interpretable.Errancy
L
0

You are looking to determine the weight of the eggs based upon age and weight of the fish correct? I think you need to use:

glm(eggW ~ fishW + age, family=Gamma(link=identity)

Instead of

glm(eggW ~ fishW * age, family=Gamma(link=identity)
Lenhard answered 21/6, 2017 at 14:50 Comment(1)
At least initially I would rather model the full interaction. However, even if I apply this change, I get even more warnings (the two messages repetead 25 times and a "algorithm did not converge" error).Cuspidate
C
0
  1. Does your dataset have missing values?
  2. Are your variables highly correlated?
  3. Turn fishW * age into a seperate column and just pass that to the algo
Centi answered 26/6, 2017 at 7:36 Comment(1)
1. No missing values 2. Quite some correlaion: -fishW vs eggW 0.88 - eggW vs age 0.58 -fishW vs age 0.62 (Spearman index) 3. The same errors are reported if I do this separately.Cuspidate

© 2022 - 2025 — McMap. All rights reserved.