How to fit autoregressive poisson mixed model (count time series) in R?
Asked Answered
R

3

15

My task is to assess how various environmental variables affect annual population fluctuations. For this, I need to fit poisson autoregressive model for time-series counts:

enter image description here

Where Ni,j is the count of observed individuals at site i in year j, x_{i,j} is environmental variable at site i in year j - these are the input data, and the rest are parameters: \mu_{i,j} is the expected number of individuals at site i in year j, and \gamma_{j} is random effect for each year.

Is it possible to fit such a model in R? I want to avoid fitting it in Bayesian framework since the computation takes way to long (I have to process 5000 of such models) I tried to transform the model for GLM, but once I had to add the random effect (gamma) it is no longer possible.

Rugg answered 19/3, 2014 at 11:40 Comment(1)
I'm very skeptical that non-bayesian solution for this exist for this precise model formulation, as $\mu_{i,j}$ will be a quantity estiamted from the model. it's highly unusual for estimated quantities to appear as offsets on the right hand side of the model. if you had $\log(N_{i, j})$, i.e. the realized values, instead of the expected value $\log(\mu_{i,j})$ as offset on the right hand side this would be very easy to fit with standard software for GLMMs: Simply use N as an offset variable. Let me know if this is possible for you, then i will add a more detailed answer.Economizer
L
11

First, let's create some simulated data (all the ad hoc functions at the end of the answer):

set.seed(12345) # updated to T=20 and L=40 for comparative purposes.

T = 20 # number of years
L = 40  # number of sites
N0 = 100 # average initial pop (to simulate data)
sd_env = 0.8 # to simulate the env (assumed mean 0)
env  = matrix(rnorm(T*L, mean=0, sd=sd_env), nrow=T, ncol=L)

# 'real' parameters
alpha  = 0.1
beta   = 0.05
sd     = 0.4
gamma  = rnorm(T-1, mean=0, sd=sd)
mu_ini = log(rpois(n=L, lambda=N0)) #initial means

par_real = list(alpha=alpha, beta=beta, gamma=gamma, 
               sd=sd, mu_ini=mu_ini)

mu = dynamics(par=par_real, x=env, T=T, L=L)

# observed abundances
n = matrix(rpois(length(mu), lambda=mu), nrow=T, ncol=L)

Now, for a given set of parameters, we can simulate the expected number of individuals at each site and year. And since we have the observed number of individuals, we can write the likelihood function for the observations (being Poisson distributed) and add a penalty for the annual deviates in the growth rate (to make it normal distributed). For this, the function dynamics will simulate the model and the function .getLogLike will calculate the objective function. Now we need to optimize the objective function. The parameters to estimate are alpha, beta, the annual deviates (gamma) and the initial expected number of individuals (mu_ini), and maybe sigma.

For the first try, we can provide 0 for all parameters as initial guesses but for the initial expected numbers for which we can use the initial observed abundances (that's the MLE anyway).

fit0 = fitModel0(obs=n, env=env, T=T, L=L)

Optimal parameters: 
      alpha        beta      gamma1      gamma2      gamma3 
 0.28018842  0.05464360 -0.12904373 -0.15795001 -0.04502903 
     gamma4      gamma5      gamma6      gamma7      gamma8 
 0.05045117  0.08435066  0.28864816  0.24111786 -0.80569709 
     gamma9     gamma10     gamma11     gamma12     gamma13 
 0.22786951  0.10326086 -0.50096088 -0.08880594 -0.33392310 
    gamma14     gamma15     gamma16     gamma17     gamma18 
 0.22664634 -0.47028311  0.11782381 -0.16328820  0.04208037 
    gamma19     mu_ini1     mu_ini2     mu_ini3     mu_ini4 
 0.17648808  4.14267523  4.19187205  4.05573114  3.90406443 
    mu_ini5     mu_ini6     mu_ini7     mu_ini8     mu_ini9 
 4.08975038  4.17185883  4.03679049  4.23091760  4.04940333 
   mu_ini10    mu_ini11    mu_ini12    mu_ini13    mu_ini14 
 4.19355333  4.05543081  4.15598515  4.18266682  4.09095730 
   mu_ini15    mu_ini16    mu_ini17    mu_ini18    mu_ini19 
 4.17922360  3.87211968  4.04509178  4.19385641  3.98403521 
   mu_ini20    mu_ini21    mu_ini22    mu_ini23    mu_ini24 
 4.08531659  4.19294203  4.29891769  4.21025211  4.16297457 
   mu_ini25    mu_ini26    mu_ini27    mu_ini28    mu_ini29 
 4.19265543  4.28925869  4.10752810  4.10957212  4.14953247 
   mu_ini30    mu_ini31    mu_ini32    mu_ini33    mu_ini34 
 4.09690570  4.34234547  4.18169575  4.01663339  4.32713905 
   mu_ini35    mu_ini36    mu_ini37    mu_ini38    mu_ini39 
 4.08121891  3.98256819  4.08658375  4.05942834  4.06988174 
   mu_ini40 
 4.05655031

This works, but normally some parameters can be correlated and more difficult to identify from data, so a sequential approach can be used (can read Bolker et al. 2013 for more info). In this case, we increase progresively the number of parameters, improving the initial guess for the optimization at each phase of the calibration. For this example, the first phase only estimate alpha and beta, and using guesses for a linear model of the growth rate and the environment. Then, in the second phase we use the estimates from the first optimization and add the annual deviates as parameters (gamma). Finally, we use the estimates of the second optimization and add the initial expected values as parameters. We add the initial expected values last assuming the initial observed values are already very close and a good guess to start, but on the other side we have no clear idea of the values of the remaining parameters.

fit  = fitModel(obs=n, env=env, T=T, L=L)


Phase 1: alpha and beta only
Optimal parameters: 
     alpha       beta 
0.18172961 0.06323379 

neg-LogLikelihood:  -5023687 
Phase 2: alpha, beta and gamma
Optimal parameters: 
      alpha        beta      gamma1      gamma2      gamma3 
 0.20519928  0.06238850 -0.35908716 -0.21453015 -0.05661066 
     gamma4      gamma5      gamma6      gamma7      gamma8 
 0.18963170  0.17800563  0.34303170  0.28960181 -0.72374927 
     gamma9     gamma10     gamma11     gamma12     gamma13 
 0.28464203  0.16900331 -0.40719047 -0.01292168 -0.25535610 
    gamma14     gamma15     gamma16     gamma17     gamma18 
 0.28806711 -0.38924648  0.19224527 -0.07875934  0.10880154 
    gamma19 
 0.24518786 

neg-LogLikelihood:  -5041345 
Phase 3: alpha, beta, gamma and mu_ini
Optimal parameters: 
        alpha          beta        gamma1        gamma2 
 0.1962334008  0.0545361273 -0.4298024242 -0.1984379386 
       gamma3        gamma4        gamma5        gamma6 
 0.0240318556  0.1909639571  0.1116636126  0.3465693397 
       gamma7        gamma8        gamma9       gamma10 
 0.3478695629 -0.7500599493  0.3600551021  0.1361405398 
      gamma11       gamma12       gamma13       gamma14 
-0.3874453347 -0.0005839263 -0.2305008546  0.2819608670 
      gamma15       gamma16       gamma17       gamma18 
-0.3615273177  0.1792020332 -0.0685681922  0.1203006457 
      gamma19       mu_ini1       mu_ini2       mu_ini3 
 0.2506129351  4.6639314468  4.7205977429  4.5802529032 
      mu_ini4       mu_ini5       mu_ini6       mu_ini7 
 4.4293994068  4.6182382472  4.7039110982  4.5668031666 
      mu_ini8       mu_ini9      mu_ini10      mu_ini11 
 4.7610910879  4.5844180026  4.7226353021  4.5823048717 
     mu_ini12      mu_ini13      mu_ini14      mu_ini15 
 4.6814189824  4.7130039559  4.6135420745  4.7100006841 
     mu_ini16      mu_ini17      mu_ini18      mu_ini19 
 4.4080115751  4.5758092977  4.7209394881  4.5150790425 
     mu_ini20      mu_ini21      mu_ini22      mu_ini23 
 4.6171948847  4.7141188899  4.8303375556  4.7392110431 
     mu_ini24      mu_ini25      mu_ini26      mu_ini27 
 4.6893526309  4.7237687961  4.8234804183  4.6333012324 
     mu_ini28      mu_ini29      mu_ini30      mu_ini31 
 4.6392335265  4.6817044754  4.6260620666  4.8713345071 
     mu_ini32      mu_ini33      mu_ini34      mu_ini35 
 4.7107116580  4.5471434622  4.8540773708  4.6129553933 
     mu_ini36      mu_ini37      mu_ini38      mu_ini39 
 4.5134108799  4.6231016082  4.5823454113  4.5969785420 
     mu_ini40 
 4.5835763300 

neg-LogLikelihood:  -5047251 

Comparing both calibrations of the model, we can see the second one provides a lower value for the objective function. Also, comparing the correlation between the 'real' annual deviates and the estimated ones, we have a higher correlation for the second calibration:

cor(gamma, fit0$par$gamma)
[1] 0.8708379
cor(gamma, fit$par$gamma)
[1] 0.9871758

And looking at the outputs, we can see we have some problems with the estimation of the initial expected values (underestimated for all sites) in the first calibration (with real data, normally a multi-phase calibration works way better):

par(mfrow=c(3,2), mar=c(3,5,1,1), oma=c(1,1,1,1))
for(i in 1:4) {
  ylim=c(0, 1.1*log(max(fit$fitted, n)))
  plot(log(fit$fitted[,i]), type="l", col="blue", ylim=ylim,
       ylab="mu (log)")
  lines(log(fit0$fitted[,i]), col="green")
  points(log(mu[,i]), col="red")
  mtext(paste("Site ", i), 3, adj=0.05, line=-2)
  if(i==3) {
    mtext(c("observed", "fitModel0", "fitModel1"), 1, adj=0.95, 
          line=-1.5:-3.5, col=c("red", "green", "blue"), cex=0.8)
  }
}

mus = rbind(mu_ini, fit$par$mu_ini, fit0$par$mu_ini)
barplot(mus, beside=TRUE, col=c("red", "blue", "green"),
        ylab="Initial expected population",
        xlab="Sites", border=NA)

gam = rbind(gamma, fit$par$gamma, fit0$par$gamma)
barplot(gam, beside=TRUE, col=c("red", "blue", "green"),
        ylab="Annual deviates", border=NA)

theplot

Finally,

system.time(fitModel(obs=n, env=env, T=T, L=L))

   user  system elapsed 
   9.85    0.00    9.85 

Which is around four time slower than the solution proposed by @Thierry using INLA (from summary(model)):

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.1070          2.3131          0.0460          2.4661

However, after byte compiling my functions, we get:

   user  system elapsed 
   7.53    0.00    7.53

It's 24% faster, and now is only 3 times slower than the INLA method. Still, I think is reasonable even for thousands of experiments (my own model takes 5 days just for one optimization, so maybe I have a bias here) and since we're using simulated data, we can compare the reliability of the parameter estimates in addition to the computer time.

# The functions -----------------------------------------------------------

require(compiler)

dynamics = function(par, obs, x, T, L) {

  alpha  = par$alpha
  beta   = par$beta
  gamma  = if(!is.null((par$gamma))) par$gamma else rep(0, T-1)
  mu_ini = if(!is.null(par$mu_ini)) exp(par$mu_ini) else obs[1,]

  mu = matrix(nrow=T, ncol=L)

  mu[1,] = mu_ini

  for(t in seq_len(T-1)) {
    log_mu_new = log(mu[t,]) + alpha + beta*x[t,] + gamma[t]
    mu[t+1, ] = exp(log_mu_new)
  }
  return(mu)
}

dynamics = cmpfun(dynamics)

reListPars = function(par) {
  out = list()
  out$alpha = as.numeric(par["alpha"])
  out$beta  = as.numeric(par["beta"])
  if(!is.na(par["sd"])) out$sd = as.numeric(par["sd"])
  gammas =  as.numeric(par[grepl("gamma", names(par))])
  if(length(gammas)>0) out$gamma = gammas
  mu_inis = as.numeric(par[grepl("mu_ini", names(par))])
  if(length(mu_inis)>0) out$mu_ini = mu_inis
  return(out)
}

reListPars = cmpfun(reListPars)

.getLogLike = function(par, obs, env, T, L) {
  par = reListPars(par)
  if(is.null(par$sd)) {
    par$sd = if(!is.null(par$gamma)) sd(par$gamma)+0.01 else 1
  }
  mu = dynamics(par=par, obs=obs, x=env, T=T, L=L)
  logLike = sum(obs*log(mu) - mu) - sum(par$gamma^2/(2*par$sd^2))
  return(-logLike)
}

.getLogLike = cmpfun(.getLogLike)

.getUpper = function(par) {
  par$alpha = 10*par$alpha + 1
  par$beta  = 10*abs(par$beta) + 1
  if(!is.null(par$gamma)) {
    if(!is.null(par$sd)) sd = par$sd else sd=sd(par$gamma)
    if(sd==0) sd=1
    par$gamma = rep(qnorm(0.999, sd=sd), length(par$gamma))
  }
  if(!is.null(par$mu_ini)) par$mu_ini = 5*par$mu_ini
  if(!is.null(par$sd)) par$sd = 10*par$sd
  par = unlist(par)
  return(par)
}

.getUpper = cmpfun(.getUpper)

.getLower = function(par) {
  par$alpha = 0 # alpha>0?
  par$beta  = -10*abs(par$beta) - 1
  if(!is.null(par$gamma)) {
    if(!is.null(par$sd)) sd = par$sd else sd=sd(par$gamma)
    if(sd==0) sd=1
      par$gamma = rep(qnorm(1-0.999, sd=sd), length(par$gamma))
  }
  if(!is.null(par$mu_ini)) par$mu_ini = 0.2*par$mu_ini
  if(!is.null(par$sd)) par$sd = 0.0001*par$sd
  par = unlist(par)
  return(par)
}

.getLower = cmpfun(.getLower)

fitModel = function(obs, env, T, L) {

  r = log(obs[-1,]/obs[-T,])
  guess = data.frame(rate=as.numeric(r), env=as.numeric(env[-T,]))
  mod1 = lm(rate ~ env, data=guess)

  output = list()
  output$par = NULL

  # Phase 1: alpha an beta only
  cat("Phase 1: alpha and beta only\n")

  par = list()
  par$alpha = as.numeric(coef(mod1)[1])
  par$beta  = as.numeric(coef(mod1)[2])

  opt = optim(par=unlist(par), fn=.getLogLike, gr=NULL, 
              obs=obs, env=env, T=T, L=L, method="L-BFGS-B",
              upper=.getUpper(par), lower=.getLower(par))
  opt$bound = data.frame(par=unlist(par), low=.getLower(par), 
                         upp=.getUpper(par))

  output$phase1 = opt

  cat("Optimal parameters: \n")
  print(opt$par)
  cat("\nneg-LogLikelihood: ", opt$value, "\n")

  # phase 2: alpha, beta and gamma
  cat("Phase 2: alpha, beta and gamma\n")

  optpar = reListPars(opt$par)
  par$alpha = optpar$alpha
  par$beta  = optpar$beta
  par$gamma = rep(0, T-1)

  opt = optim(par=unlist(par), fn=.getLogLike, gr=NULL, 
              obs=obs, env=env, T=T, L=L, method="L-BFGS-B",
              upper=.getUpper(par), lower=.getLower(par))
  opt$bound = data.frame(par=unlist(par), low=.getLower(par), 
                           upp=.getUpper(par))

  output$phase2 = opt

  cat("Optimal parameters: \n")
  print(opt$par)
  cat("\nneg-LogLikelihood: ", opt$value, "\n")

  # phase 3: alpha, beta, gamma and mu_ini
  cat("Phase 3: alpha, beta, gamma and mu_ini\n")

  optpar = reListPars(opt$par)
  par$alpha = optpar$alpha
  par$beta  = optpar$beta
  par$gamma = optpar$gamma
  par$mu_ini = log(obs[1,])

  opt = optim(par=unlist(par), fn=.getLogLike, gr=NULL, 
              obs=obs, env=env, T=T, L=L, method="L-BFGS-B",
              upper=.getUpper(par), lower=.getLower(par),
              control=list(maxit=1000))
  opt$bound = data.frame(par=unlist(par), low=.getLower(par), 
                         upp=.getUpper(par))

  output$phase3 = opt

  cat("Optimal parameters: \n")
  print(opt$par)
  cat("\nneg-LogLikelihood: ", opt$value, "\n")

  output$par = reListPars(opt$par)

  output$fitted = dynamics(par=output$par, obs=obs, x=env, T=T, L=L)
  output$observed = obs
  output$env = env

  return(output)

}

fitModel = cmpfun(fitModel)

fitModel0 = function(obs, env, T, L) {

  output = list()
  output$par = NULL

  par = list()
  par$alpha = 0
  par$beta  = 0
  par$gamma = rep(0, T-1)
  par$mu_ini = log(obs[1,])

  opt = optim(par=unlist(par), fn=.getLogLike, gr=NULL, 
              obs=obs, env=env, T=T, L=L, method="L-BFGS-B",
              upper=.getUpper(par), lower=.getLower(par))
  opt$bound = data.frame(par=unlist(par), low=.getLower(par), 
                         upp=.getUpper(par))

  output$phase1 = opt

  cat("Optimal parameters: \n")
  print(opt$par)
  cat("\nneg-LogLikelihood: ", opt$value, "\n")

  output$par = reListPars(opt$par)

  output$fitted = dynamics(par=output$par, obs=obs, x=env, T=T, L=L)
  output$observed = obs
  output$env = env

  return(output)

}

fitModel0 = cmpfun(fitModel0)
Linet answered 24/3, 2014 at 2:49 Comment(12)
How is the object n defined?Mucky
Uff, did you just write your own optimizer based on optim? Is this a "clean" frequentist approach to modelling, or at least as clean glm? I mean, this approach is completely new to me, is it documented somewhere, with all the things like model validation, precision etc.? I am a bit conservative to completely new methods, how they have been tested etc. I also need to cite the method somehow in an article. Anyway, I will try your script and compare to my bayesian analysis and come back to you.Rugg
@Thierry: I missed one line: # observed abundances n = matrix(rpois(length(mu), lambda=mu), nrow=T, ncol=L) added to the code.Linet
Somebody downvote, so maybe there's a mistake or something wrong, but not sure which part is "new". The idea is: the model have some parameters. We used the parameters to simulate the model. Then compared the observations to the model outputs given the assumed distribution of the observations (Poisson) and calculated the likelihood as function of the parameters. Then, we minimize the negative log-likelihood function to get the "optimal" parameters. I think you can do the same for GLM or AR models, even if other alternatives available for parameter estimation (e.g. bayesian).Linet
About doing it in several steps is to improve the estimate of annual deviates which are one of the focus in the study, right? When using local optimization methods, you can get stuck in a local minimum, so it's useful to start at better initial estimates for your parameters. I do this all the time, so I'm very interested in getting feedback.Linet
Ricardo, my concern is if this method you use is already described somewhere and can be cited, or is it just your idea. I need a tested solution that can be cited. Please follow my previous comment.Rugg
I think this is mainstream maximum likelihood estimation. We are trying to find the parameters which make the observations the most probable, given the model. When I saw you question I read "how to fit a biomass dynamics model with observation error only", hehe. You can check "The Ecological Detective", chapter 7, the exercise is almost exactly your question. So, you can see my solution as the implementation of pseudocode 7.2 in the book. You can also cite my paper, on the sequential calibration stuff, but still not accepted. I'm also working on environment and population fluctuations...Linet
I see 4 your papers on WoS, looks interesting. I have looked in the book you recommended, pseudocode 7.2. But I don't see any random effect in the model in the book. Did you not forget to take into account that gamma_j in my model is random, not fixed effect? In R, REML (Restricted maximum likelihood) is used instead of ML for mixed models, so are you sure this will work for mixed model too?Rugg
From the optimization point of view, there's not real difference between fixed or random effects parameters. It matters if you want to estimate an analytical solution, but not for a numerical one. In stock assessment random effects are used for recruitment deviates, I'll try to look for a reference. And, yes, I'm using REML. The variance of the random effects is a parameter but normally complicates things, so it's "estimated" from the observed variance of the random effect parameters. Still, the code allows to estimate the variance if you want.Linet
There is a difference in fitting effect as random instead of fixed. There are basicly two differences: 1) theRugg
@TMS, your comment is trimmed. I know there are differences, but to get the parameters you only need to build your objective function properly (taking into account some parameters are "random effects") and then optimize it.Linet
Ricardo, I will award the bounty to you for the effort, and I appreciate it. However I am not sure if I will use your approach. I am bit afraid how to describe and cite the method in the paper, and what if there is a mistake, what if I will not handle the convergence correctly etc. I am bit afraid to use solutions other than either tested packages or I don't understand them fully.Rugg
M
1

Have a look at the INLA package http://www.r-inla.org

It is Bayesian, but uses Integrated nested Laplace approximation which makes the speed of a model compareable to that of frequentist models (glm, glmm).

Starting from mu and env from Ricardo Oliveros-Ramos with L = 40. First prepare the dataset

dataset <- data.frame(
  count = rpois(length(mu), lambda = mu),
  year = rep(seq_len(T), L),
  site = rep(seq_len(L), each = T),
  env = as.vector(env)
)
library(reshape2)
n <- as.matrix(dcast(year ~ site, data = dataset, value.var = "count")[, -1])
dataset$year2 <- dataset$year

Run the model

library(INLA)
system.time(
  model <- inla(
    count ~ 
      env +
      f(year, model = "ar1", replicate = site) + 
      f(year2, model = "iid"), 
    data = dataset,
    family = "poisson"
  )
)
   user  system elapsed 
   0.18    0.14    3.77

Compare the speed with the solution from Ricardo

system.time(test <- fitModel(obs=n, env=env, T=T, L=L))
   user  system elapsed 
  11.06    0.00   11.06 

Compare the speed with a frequentist glmm (without autocorrelation)

library(lme4)
system.time(
  m <- glmer(
    count ~ env + (1|site) + (1|year), 
    data = dataset, 
    family = poisson
  )
)
   user  system elapsed 
   0.44    0.00    0.44 

The speed of inla without autocorrelation

system.time(
  model <- inla(
    count ~ 
      env +
      f(site, model = "iid") + 
      f(year, model = "iid"), 
    data = dataset,
    family = "poisson"
  )
)
   user  system elapsed 
   0.19    0.11    2.09
Mucky answered 24/3, 2014 at 0:2 Comment(4)
I didn't know about this package, looks useful. I'm updating my answer with L=40. Would you mind to add the estimated parameters for comparative purposes? Also, you missed the 'env' variable in your data.Linet
I have updated the code. The INLA model will have different parameters because the parametrisation is different. mu_ij = site_ij + \alpha + \beta * env + \gamma_j with site_ij = \rho * site_i(j-1) + \epsilon_ijMucky
But, in that case, that's not the model. log(mu_ij/mu_i(j-1)) is the growth rate of the population, and that's what we want to model at the end, being constant (alpha, species specific), varying as function of the environment (at each site) and with an random annual fluctuation (for every year).Linet
Thierry, it seems you completely missed the autoregression part - the log(mu_i,j) on the right side of the equation?Rugg
C
0

The model formula is not the same as what you have given, but from the title of your question it seems like the hhh4 function in the surveillance package on CRAN might be of interest. It allows you to fit Poisson autoregressive models with random effects. There are some examples in the bottom of the documentation for that function. I believe that currently the fixed effects must be limited to an intercept, a long-term time trend, and a seasonal component for each site, but perhaps that will work for you.

Cornellcornelle answered 22/3, 2014 at 18:17 Comment(4)
This doesn't look bad. Can you please update your answer so that it is appparent that the model requested can actually be fitted with this function, and how?Rugg
You have chance to win the bounty if you answer fast.Rugg
I read cran.r-project.org/web/packages/surveillance/vignettes/hhh4.pdf and I don't think my model can be fit with hhh4. There is no trend component in my model.Rugg
I realize I missed the chance for bounty but I'll see if I can answer your question anyway. If your x_{i,j} is a scalar you could treat it as time and then \beta could be estimated as a time trend. But I think the appearance of log(\mu_{i,j}) on the right-hand side and a random effect for each year does make your model outside the scope of hhh4. To use that function, you could possibly use a negative binomial response in place of the Poisson with random effect and then put N_{i,j} in place of log(\mu_{i,j}) on the right-hand side. Of course, you could also then use MASS::glm.nb to fit it.Cornellcornelle

© 2022 - 2024 — McMap. All rights reserved.