lme4::glmer vs. Stata's melogit command

Asked 21/6, 2017 at 13:25 Answered 23/6, 2017 at 19:13

Lately I have been trying to fit a lot of random effects models to relatively big datasets. Let’s say about 50,000 people (or more) observed at up to 25 time points. With such a large sample size, we include a lot of predictors that we’re adjusting for – maybe 50 or so fixed effects. I’m fitting the model to a binary outcome using lme4::glmer in R, with random intercepts for each subject. I can't go into specifics on the data, but the basic format of the glmer command I used was:

fit <-  glmer(outcome ~ treatment + study_quarter + dd_quarter + (1|id),
              family = "binomial", data = dat)

where both study_quarter and dd_quarter are factors with about 20 levels each.

When I try to fit this model in R, it runs for about 12-15 hours, and returns an error that it failed to converge. I did a bunch of troubleshooting (e.g., following these guidelines), with no improvement. And the convergence isn’t even close in the end (max gradient around 5-10, whereas the convergence criterion is 0.001 I think).

I then tried fitting the model in Stata, using the melogit command. The model fit in under 2 mins, with no convergence issues. The corresponding Stata command is

melogit outcome treatment i.study_quarter i.dd_quarter || id:

What gives? Does Stata just have a better fitting algorithm, or one better optimized for large models and large datasets? It’s really surprising how different the run times were.

Donte answered 21/6, 2017 at 13:25 Comment(5)

[Here] (statalist.org/forums/forum/general-stata-discussion/general/…) is an example of the opposite case - R deals quickly with apparently identical SEM model impossible to estimate in Stata.. – Priebe 22/6, 2017 at 7:58

I am not sure, but can it be that the default option in glmer for binomial family is probit and not logit? Maybe you could add family = binomial(link = "logit") and try it then? – Silicone 22/6, 2017 at 9:49

@radek - Thank you for sharing, but I am referring specifically to mixed effects modeling, not SEMs. I know there are plenty of cases where R "beats" Stata. – Donte 23/6, 2017 at 20:38

@EndreBorbáth - no, the default is logit – Donte 23/6, 2017 at 20:39

I'm having exactly the same experience as you describe - days of running time and convergence warnings in R with a binomial glmer - Stata melogit fits within minutes, getting close if not identical results for the same specification – Mutism 3/11, 2022 at 15:46

The glmer fit will probably be much faster with the optional argument nAGQ=0L. You have many fixed-effects parameters (20 levels for each of study_quarter and dd_quarter generate a total of 28 contrasts) and the default optimization method (corresponding to nAGQ=1L) puts all of those coefficients into the general nonlinear optimization call. With nAGQ=0L these coefficients are optimized within the much faster penalized iteratively reweighted least squares (PIRLS) algorithm. The default will generally provide a better estimate in the sense that the deviance at the estimate is lower, but the difference is usually very small and the time difference is enormous.

I have a write-up of the differences in these algorithms as a Jupyter notebook nAGQ.ipynb. That writeup uses the MixedModels package for Julia instead of lme4 but the methods are similar. (I am one of the authors of lme4 and the author of MixedModels.)

If you are going to be doing a lot of GLMM fits I would consider doing so in Julia with MixedModels. It is often much faster than R, even with all the complicated code in lme4.

Wendolyn answered 23/6, 2017 at 19:13 Comment(7)

I tried setting nAGQ to 0, and the model fit in about 5 mins, with no convergence problems. Still a bit slower than Stata, but more than acceptable. – Donte 23/6, 2017 at 20:48

I wonder if a model with the default nAGQ=1L would converge faster if lmer first fit the model with PIRLS (corresponding to nAGQ=0) to get better starting values, and then refined the model with the Laplace approximation (nAGQ=1)? Maybe that's already being done internally, but if so I'm surprised that after 15 hours of run time, the model was nowhere close to convergence (max gradient around 5-10). – Donte 23/6, 2017 at 20:53

An initial nAGQ=0L fit is used to create starting estimates for nAGQ=1L. It helps some but not as much as might be expected. The problem is the number of parameters to optimize in the nonlinear optimization routine. You can also try changing the optimizer to NLOPT_LN_BOBYQA from the nloptr package. – Wendolyn 24/6, 2017 at 13:14

I gave the wrong link for the nAGQ.ipynb, which is corrected here. – Wendolyn 24/6, 2017 at 13:16

One follow-up question: when you say the estimate is better for nAGQ=1L than for PIRLS, are you referring to the fixed effects estimates, the random effects estimates, or both? If my parameters of interest are all fixed effect parameters, am I in any way "safer" in using the PIRLS approximation than if I was interested in the random effects? – Donte 28/6, 2017 at 18:11

By "better" I simply mean a lower value of the Laplace approximation to the deviance. That is, the estimates are closer to the maximum likelihood estimates. – Wendolyn 30/6, 2017 at 21:52

Game changer - I've actually managed to run the model in R since starting to read this post! – Mutism 3/11, 2022 at 16:1

Are you sure that Stata is reading in the whole file?

http://www.stata.com/manuals13/rlimits.pdf

The reason I ask is because it sounds to me like you've got 50k persons observed 25 times (1,250,000 rows); depending on the version of Stata you're using you could be getting truncated results.

EDIT Since it's not a file length issue have you tried the other packages for mixed effects like nlme? I suspect that the non-linear mixed effects model would take your data somewhat faster.

EDIT This resource may be more helpful than anything about the different models: https://stats.stackexchange.com/questions/173813/r-mixed-models-lme-lmer-or-both-which-one-is-relevant-for-my-data-and-why

Selie answered 21/6, 2017 at 13:31 Comment(4)

Source is 2 versions out of date. Stata 15 document at stata.com/manuals/rlimits.pdf Either way, 1 million observations (rows, in your terms) is not itself an issue in any serious Stata. – Plat 21/6, 2017 at 13:45

I have not tried nlme, and I agree it's worth a shot. But I doubt that "somewhat faster" would be the difference between 15 hrs (without convergence) and 2 mins (with convergence). I imagine the fitting algorithms would share a lot in common, due to the overlap in authors, but that is just an assumption. – Donte 21/6, 2017 at 18:12

And to confirm what @NickCox said, the model output says that it is using all the observations in the dataset, so I don't think this is an issue. – Donte 21/6, 2017 at 20:34

@Selie - Regarding the link in your second edit, this thread focuses on differences between an lmer fit using REML and a lme fit using ML. Once both models were fit with REML, the two approaches gave near-identical results. Since I have a binomial outcome, I am using glmer, which uses ML (see bbolker.github.io/mixedmodels-misc/glmmFAQ.html#reml-for-glmms). Stata also uses ML, so this cannot be the difference in their fits. – Donte 23/6, 2017 at 14:58

Recommended topics

Hot tags