How to interpret some syntax (n.adapt, update..) in jags?

This answer is based on the package rjags, which takes an n.adapt argument. First I will discuss the meanings of adaptation, burn-in, and thinning, and then I will discuss the syntax (I sense that you are well aware of the meaning of burn-in and thinning, but not of adaptation; a full explanation may make this answer more useful to future readers).

Burn-in As you probably understand from introductions to MCMC sampling, some number of iterations from the MCMC chain must be discarded as burn-in. This is because prior to fitting the model, you don't know whether you have initialized the MCMC chain within the characteristic set, the region of reasonable posterior probability. Chains initialized outside this region take a finite (sometimes large) number of iterations to find the region and begin exploring it. MCMC samples from this period of exploration are not random draws from the posterior distribution. Therefore, it is standard to discard the first portion of each MCMC chain as "burn-in". There are several post-hoc techniques to determine how much of the chain must be discarded.

Thinning A separate problem arises because in all but the simplest models, MCMC sampling algorithms produce chains in which successive draws are substantially autocorrelated. Thus, summarizing the posterior based on all iterations of the MCMC chain (post burn-in) may be inadvisable, as the effective posterior sample size can be much smaller than the analyst realizes (note that STAN's implementation of Hamiltonian Monte-Carlo sampling dramatically reduces this problem in some situations). Therefore, it is standard to make inference on "thinned" chains where only a fraction of the MCMC iterations are used in inference (e.g. only every fifth, tenth, or hundredth iteration, depending on the severity of the autocorrelation).

Adaptation The MCMC samplers that JAGS uses to sample the posterior are governed by tunable parameters that affect their precise behavior. Proper tuning of these parameters can produce gains in the speed or de-correlation of the sampling. JAGS contains machinery to tune these parameters automatically, and does so as it draws posterior samples. This process is called adaptation, but it is non-Markovian; the resulting samples do not constitute a Markov chain. Therefore, burn-in must be performed separately after adaptation. It is incorrect to substitute the adaptation period for the burn-in. However, sometimes only relatively short burn-in is necessary post-adaptation.

Syntax Let's look at a highly specific example (the code in the OP doesn't actually show where parameters like n.adapt or thin get used). We'll ask rjags to fit the model in such a way that each step will be clear.

 n.chains = 3
 n.adapt = 1000
 n.burn = 10000
 n.iter = 20000
 thin = 50
 my.model <- jags.model(mymodel.txt, data=X, inits=Y, n.adapt=n.adapt) # X is a list pointing JAGS to where the data are, Y is a vector or function giving initial values
 update(my.model, n.burn)
 my.samples <- coda.samples(my.model, params, n.iter=n.iter, thin=thin) # params is a list of parameters for which to set trace monitors (i.e. we want posterior inference on these parameters)

jags.model() builds the directed acyclic graph and then performs the adaptation phase for a number of iterations given by n.adapt. update() performs the burn-in on each chain by running the MCMC for n.burn iterations without saving any of the posterior samples (skip this step if you want to examine the full chains and discard a burn-in period post-hoc). coda.samples() (from the coda package) runs the each MCMC chain for the number of iterations specified by n.iter, but it does not save every iteration. Instead, it saves only ever nth iteration, where n is given by thin. Again, if you want to determine your thinning interval post-hoc, there is no need to thin at this stage. One advantage of thinning at this stage is that the coda syntax makes it simple to do so; you don't have to understand the structure of the MCMC object returned by coda.samples() and thin it yourself. The bigger advantage to thinning at this stage is realized if n.iter is very large. For example, if autocorrelation is really bad, you might run 2 million iterations and save only every thousandth (thin=1000). If you didn't thin at this stage, you (and your RAM) would need to manipulate an object with three chains of two million numbers each. But by thinning as you go, the final object only has 2 thousand numbers in each chain.

Recommended topics

Hot tags