Intro
I'm trying to construct a GLM that models the quantity (mass) of eggs the specimens of a fish population lays depending on its size and age.
Thus, the variables are:
eggW
: the total mass of layed eggs, a continuous and positive variable ranging between 300 and 30000.fishW
: mass of the fish, continuous and positive, ranging between 3 and 55.age
: either 1 or 2 years.
No 0's, no NA's.
After checking and realising assuming a normal distribution was probably not appropriate, I decided to use a Gamma distribution. I chose Gamma basically because the variable was positive and continuous, with increasing variance with higher values and appeared to be skewed, as you can see in the image below.
Frequency distribution of eggW values:
The code
myglm <- glm(eggW ~ fishW * age, family=Gamma(link=identity),
start=c(mean(data$eggW),1,1,1),
maxit=100)
I added the maxit
factor after seeing it suggested on a post of this page as a solution to glm.fit: algorithm did not converge
error, and it worked.
I chose to work with link=identity
because of the more obvious and straightforward interpretation of the results in biological terms rather than using an inverse
or log
link.
So, the code above results in the next message:
Warning messages:
1: In log(ifelse(y == 0, 1, y/mu)) : NaNs produced
2: step size truncated due to divergence
Importantly, no error warnings are shown if the variable fishW
is dropped and only age is kept. No errors are reported if a log
link is used.
Questions
If the rationale behind the design of my model is acceptable, I would like to understand why these errors are reported and how to solve or avoid them. In any case, I would appreciate any criticism or suggestions.
eggW
andfishW
? – Concordat