I'm looking for suggestions for a strategy of fitting generalized linear mixed-effects models for a relative large data-set.
Consider I have data on 8 milllion US basketball passes on about 300 teams in 10 years. The data looks something like this:
data <- data.frame(count = c(1,1,2,1,1,5),
length_pass= c(1,2,5,7,1,3),
year= c(1,1,1,2,2,2),
mean_length_pass_team= c(15,15,9,14,14,8),
team= c('A', 'A', 'B', 'A', 'A', 'B'))
data
count length_pass year mean_length_pass_team team
1 1 1 1 15 A
2 1 2 1 15 A
3 2 5 1 9 B
4 1 7 2 14 A
5 1 1 2 14 A
6 5 3 2 8 B
I'm want to explain the count
of steps a player takes before passing the ball. I have theoretical motivations to assume there are team-level differences between count
and length_pass
, so a multi-level (i.e. mixed effects) model seems appropriate.
My individual level control variables are length_pass
and year
.
On the team-level I have mean_length_pass_team
. This should help me to avoid ecological fallacies, according to Snijders, 2011.
I have been using the lme4
and brms
packages to estimate these models but it takes days/weeks to fit these models on my local 12-core 128GB machine.
library(lme4)
model_a <- glmer(count ~ length_pass + year + mean_length_pass_team + (1 | team),
data=data,
family= "poisson",
control=glmerControl(optCtrl=list(maxfun=2e8)))
library(brms)
options (mc.cores=parallel::detectCores ())
model_b <- brm(count ~ length_pass + year + mean_length_pass_team + (1 | team),
data=data,
family= "poisson")
I am looking for suggestions to speed up the fitting process or new techniques to fit a generalized linear mixed-effects model:
- (How) Can I improve the speed on the
lme4
andbrms
fits? - Are there other packages to consider?
- Are there step-wise procedures that can help increase the speed of fitting models?
- Are there interesting options outside the R environment that can help me fit this?
Any pointers are much appreciated!
biglm
package doesn't accept a multilevel formula - that is, the | is problematic. But thanks for your thoughts! – ChalazanAGQ=0
for speed up or try Julia – IncorrectnAGQ = 0
command helps to speed up significantly. Julia also seems a good option! Thanks – Chalaza