Lately I have been trying to fit a lot of random effects models to relatively big datasets. Let’s say about 50,000 people (or more) observed at up to 25 time points. With such a large sample size, we include a lot of predictors that we’re adjusting for – maybe 50 or so fixed effects. I’m fitting the model to a binary outcome using lme4::glmer
in R, with random intercepts for each subject. I can't go into specifics on the data, but the basic format of the glmer
command I used was:
fit <- glmer(outcome ~ treatment + study_quarter + dd_quarter + (1|id),
family = "binomial", data = dat)
where both study_quarter
and dd_quarter
are factors with about 20 levels each.
When I try to fit this model in R, it runs for about 12-15 hours, and returns an error that it failed to converge. I did a bunch of troubleshooting (e.g., following these guidelines), with no improvement. And the convergence isn’t even close in the end (max gradient around 5-10, whereas the convergence criterion is 0.001 I think).
I then tried fitting the model in Stata, using the melogit command. The model fit in under 2 mins, with no convergence issues. The corresponding Stata command is
melogit outcome treatment i.study_quarter i.dd_quarter || id:
What gives? Does Stata just have a better fitting algorithm, or one better optimized for large models and large datasets? It’s really surprising how different the run times were.
family = binomial(link = "logit")
and try it then? – Siliconeglmer
- Statamelogit
fits within minutes, getting close if not identical results for the same specification – Mutism