Do I need to set refit=FALSE when testing for random effects in lmer() models with anova()?

I am currently testing whether I should include certain random effects in my lmer model or not. I use the anova function for that. My procedure so far is to fit the model with a function call to lmer() with REML=TRUE (the default option). Then I call anova() on the two models where one of them does include the random effect to be tested for and the other one doees not. However, it is well known that the anova() function refits the model with ML but in the new version of anova() you can prevent anova() from doing so by setting the option refit=FALSE. In order to test for random effects should I set refit=FALSE in my call to anova() or not? (If I do set refit=FALSE the p-values tend to be lower. Are the p-values anti-conservative when I set refit=FALSE?)

Method 1:

    mod0_reml <- lmer(x ~ y + z + (1 | w), data=dat)
    mod1_reml <- lmer(x ~ y + z + (y | w), data=dat)
    anova(mod0_reml, mod1_reml)

This will result in anova() refitting the models with ML instead of REML. (Newer versions of the anova() function will also output an info about this.)

Method 2:

    mod0_reml <- lmer(x ~ y + z + (1 | w), data=dat)
    mod1_reml <- lmer(x ~ y + z + (y | w), data=dat)
    anova(mod0_reml, mod1_reml, refit=FALSE)

This will result in anova() performing its calculations on the original models, i.e. with REML=TRUE.

Which of the two methods is correct in order to test whether I should include a random effect or not?

Thanks for any help

In general I would say that it would be appropriate to use refit=FALSE in this case, but let's go ahead and try a simulation experiment.

First fit a model without a random slope to the sleepstudy data set, then simulate data from this model:

library(lme4)
mod0 <- lmer(Reaction ~ Days + (1|Subject), data=sleepstudy)
## also fit the full model for later use
mod1 <- lmer(Reaction ~ Days + (Days|Subject), data=sleepstudy)
set.seed(101)
simdat <- simulate(mod0,1000)

Now refit the null data with the full and the reduced model, and save the distribution of p-values generated by anova() with and without refit=FALSE. This is essentially a parametric bootstrap test of the null hypothesis; we want to see if it has the appropriate characteristics (i.e., uniform distribution of p-values).

sumfun <- function(x) {
    m0 <- refit(mod0,x)
    m1 <- refit(mod1,x)
    a_refit <- suppressMessages(anova(m0,m1)["m1","Pr(>Chisq)"])
    a_no_refit <- anova(m0,m1,refit=FALSE)["m1","Pr(>Chisq)"]
    c(refit=a_refit,no_refit=a_no_refit)
}

I like plyr::laply for its convenience, although you could just as easily use a for loop or one of the other *apply approaches.

library(plyr)
pdist <- laply(simdat,sumfun,.progress="text")

library(ggplot2); theme_set(theme_bw())
library(reshape2)
ggplot(melt(pdist),aes(x=value,fill=Var2))+
     geom_histogram(aes(y=..density..),
        alpha=0.5,position="identity",binwidth=0.02)+
     geom_hline(yintercept=1,lty=2)
ggsave("nullhist.png",height=4,width=5)

histogram of null distributions

Type I error rate for alpha=0.05:

colMeans(pdist<0.05)
##   refit no_refit 
##   0.021    0.026

You can see that in this case the two procedures give practically the same answer and both procedures are strongly conservative, for well-known reasons having to do with the fact that the null value of the hypothesis test is on the boundary of its feasible space. For the specific case of testing a single simple random effect, halving the p-value gives an appropriate answer (see Pinheiro and Bates 2000 and others); this actually appears to give reasonable answers here, although it is not really justified because here we are dropping two random-effects parameters (the random effect of slope and the correlation between the slope and intercept random effects):

colMeans(pdist/2<0.05)
##   refit no_refit 
##   0.051    0.055

Other points:

You might be able to do a similar exercise with the PBmodcomp function from the pbkrtest package.
The RLRsim package is designed precisely for fast randomization (parameteric bootstrap) tests of null hypotheses about random effects terms, but doesn't appear to work in this slightly more complex situation
see the relevant GLMM faq section for similar information, including arguments for why you might not want to test the significance of random effects at all ...
for extra credit you could redo the parametric bootstrap runs using the deviance (-2 log likelihood) differences rather than the p-values as output and check whether the results conformed to a mixture between a chi^2_0 (point mass at 0) and a chi^2_n distribution (where n is probably 2, but I wouldn't be sure for this geometry)

Recommended topics

Hot tags