I am trying to use the Python statsmodels linear mixed effects model to fit a model that has two random intercepts, e.g. two groups. I cannot figure out how to initialize the model so that I can do this.
Here's the example. I have data that looks like the following (taken from here):
subject gender scenario attitude frequency
F1 F 1 pol 213.3
F1 F 1 inf 204.5
F1 F 2 pol 285.1
F1 F 2 inf 259.7
F1 F 3 pol 203.9
F1 F 3 inf 286.9
F1 F 4 pol 250.8
F1 F 4 inf 276.8
I want to make a linear mixed effects model with two random effects -- one for the subject group and one for the scenario group. I am trying to do this:
import statsmodels.api as sm
model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data[['subject', 'scenario']])
result = model.fit()
print result.summary()
I keep getting this error:
LinAlgError: Singular matrix
It works fine in R. When I use lme4
in R with the formula-based rendering it fits just fine:
politeness.model = lmer(frequency ~ attitude + gender +
(1|subject) + (1|scenario), data=politeness)
I don't understand why this is happening. It works when I use any one random effect/group, e.g.
model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data['subject'])
Then I get:
Mixed Linear Model Regression Results
===============================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 6 Scale: 850.9456
Min. group size: 13 Likelihood: -393.3720
Max. group size: 14 Converged: Yes
Mean group size: 13.8
---------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
---------------------------------------------------------------
Intercept 256.785 15.226 16.864 0.000 226.942 286.629
attitude[T.pol] -19.415 6.407 -3.030 0.002 -31.972 -6.858
gender[T.M] -108.325 21.064 -5.143 0.000 -149.610 -67.041
Intercept RE 603.948 23.995
===============================================================
Alternatively, if I do:
model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data['scenario'])
This is the result I get:
Mixed Linear Model Regression Results
================================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 7 Scale: 1110.3788
Min. group size: 11 Likelihood: -402.5003
Max. group size: 12 Converged: Yes
Mean group size: 11.9
----------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
----------------------------------------------------------------
Intercept 256.892 8.120 31.637 0.000 240.977 272.807
attitude[T.pol] -19.807 7.319 -2.706 0.007 -34.153 -5.462
gender[T.M] -108.603 7.319 -14.838 0.000 -122.948 -94.257
Intercept RE 182.718 5.502
================================================================
I have no idea what's going on. I feel like I am missing something foundational in the statistics of the problem.