How to have multiple groups in Python statsmodels linear mixed effects model?
Asked Answered
S

4

13

I am trying to use the Python statsmodels linear mixed effects model to fit a model that has two random intercepts, e.g. two groups. I cannot figure out how to initialize the model so that I can do this.

Here's the example. I have data that looks like the following (taken from here):

subject gender  scenario    attitude    frequency
F1  F   1   pol 213.3
F1  F   1   inf 204.5
F1  F   2   pol 285.1
F1  F   2   inf 259.7
F1  F   3   pol 203.9
F1  F   3   inf 286.9
F1  F   4   pol 250.8
F1  F   4   inf 276.8

I want to make a linear mixed effects model with two random effects -- one for the subject group and one for the scenario group. I am trying to do this:

import statsmodels.api as sm
model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data[['subject', 'scenario']])
result = model.fit()
print result.summary()

I keep getting this error:

LinAlgError: Singular matrix

It works fine in R. When I use lme4 in R with the formula-based rendering it fits just fine:

politeness.model = lmer(frequency ~ attitude + gender + 
        (1|subject)  + (1|scenario), data=politeness)

I don't understand why this is happening. It works when I use any one random effect/group, e.g.

model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data['subject'])

Then I get:

                 Mixed Linear Model Regression Results
===============================================================
Model:                MixedLM   Dependent Variable:   frequency
No. Observations:     83        Method:               REML     
No. Groups:           6         Scale:                850.9456 
Min. group size:      13        Likelihood:           -393.3720
Max. group size:      14        Converged:            Yes      
Mean group size:      13.8                                     
---------------------------------------------------------------
                 Coef.   Std.Err.   z    P>|z|  [0.025   0.975]
---------------------------------------------------------------
Intercept        256.785   15.226 16.864 0.000  226.942 286.629
attitude[T.pol]  -19.415    6.407 -3.030 0.002  -31.972  -6.858
gender[T.M]     -108.325   21.064 -5.143 0.000 -149.610 -67.041
Intercept RE     603.948   23.995                              
===============================================================

Alternatively, if I do:

model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data['scenario'])

This is the result I get:

              Mixed Linear Model Regression Results
================================================================
Model:               MixedLM    Dependent Variable:    frequency
No. Observations:    83         Method:                REML     
No. Groups:          7          Scale:                 1110.3788
Min. group size:     11         Likelihood:            -402.5003
Max. group size:     12         Converged:             Yes      
Mean group size:     11.9                                       
----------------------------------------------------------------
                 Coef.   Std.Err.    z    P>|z|  [0.025   0.975]
----------------------------------------------------------------
Intercept        256.892    8.120  31.637 0.000  240.977 272.807
attitude[T.pol]  -19.807    7.319  -2.706 0.007  -34.153  -5.462
gender[T.M]     -108.603    7.319 -14.838 0.000 -122.948 -94.257
Intercept RE     182.718    5.502                               
================================================================

I have no idea what's going on. I feel like I am missing something foundational in the statistics of the problem.

Sheree answered 25/8, 2016 at 18:50 Comment(7)
Please do not include images of your data, but instead include text. Better yet, include data in a manner than allows for a reproducible example. While you do this, take some time to read through your question and fix some of the missing elements. For example, "Alternatively, if I do: " is followed by white space.Tumular
@Tumular Thanks for the feedback. I'm not sure how to format the table as text -- like code? Also fixed up the errors in the original post.Sheree
It might be worth looking through the python questions with statmodel tag to see how others do it.Tumular
... also ... I would (strongly) recommend that you delete the last three sentences of your question. It's understandable that you're frustrated, but complaining about free software often fails to inspire the people who write it to help you out ... (perhaps you could volunteer/make some suggestions to improve the documentation?)Grilse
format the table as code, or in some other way that makes it easy for others to import the data into their own Python sessions ...Grilse
@BenBolker thanks for the feedback. I will totally edit the question to be more constructive :) and thanks a lot for your answer. Reading it, I realize I don't fully understand what is happening under the hood and I need to figure out what is going on. I would love to help out on the documentation -- how can I help?Sheree
I don't know the statsmodels platform well, but I would to to github.com/statsmodels/statsmodels and open a new issue asking how/where documentation help would be most useful. Or, if you feel bold, read up on how Github works (if necessary), fork the repository, make some useful documentation changes, and submit a pull request.Grilse
G
16

You are trying to fit a model with crossed random effects, i.e., you want to allow for consistent variation among subjects across scenarios as well as consistent variation among scenarios across subjects. You can use multiple random-effects terms in statsmodels, but they must be nested. Fitting crossed (as opposed to nested) random effects requires more sophisticated algorithms, and indeed the statsmodels documentation says (as of 25 Aug 2016, emphasis added):

Some limitations of the current implementation are that it does not support structure more complex on the residual errors (they are always homoscedastic), and it does not support crossed random effects. We hope to implement these features for the next release.

As far as I can see, your choices are (1) fall back to a nested model (i.e. fit the model as though either scenario is nested within subject or vice versa - or try both and see if the difference matters); (2) fall back to lme4, either within R or via rpy2.

As always, you're entitled to a full refund of the money you paid to use statsmodels ...

Grilse answered 26/8, 2016 at 0:23 Comment(0)
R
5

Multiple or crossed random intercepts crossed effects can be fit using variance components, which are implemented in a different way from the one-group mixed effects.

I don't find an example, and the documentation seems to be only partially updated.

The unit tests contain an example using the MixedLM formula interface:

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/regression/tests/test_lme.py#L284

Radiopaque answered 26/8, 2016 at 16:37 Comment(0)
O
0

You can use Pymer4 to do this in Python with more than 1 group. Here is the documentation. https://eshinjolly.com/pymer4/

Here is an example of my code to use it:

from pymer4.models import Lmer
selected_columns = ['subid', 'accuracy', 'rt', 'response_switch', 'classwitch', 'Experiment']
df = df[selected_columns]
model = Lmer('accuracy ~ classwitch * response_switch + (1|subid) + (1|Experiment)', data=df, family='binomial')
result = model.fit()
result
Oman answered 19/10, 2023 at 22:38 Comment(0)
D
0

You can use below approach of (vc_formula) in statsmodel Mixedlm, to similar result to R code (politeness.model = lmer(frequency ~ attitude + gender + (1|subject) + (1|scenario), data=politeness))

import pandas as pd
import numpy as np
import statsmodels.api as sm
import random
import statsmodels.api as sm
import statsmodels.formula.api as smf
df = pd.DataFrame(politeness)
fixed_formula='frequency ~ attitude + gender'
vc_formula={'g1': "0 + C(subject)",
'g2': "0 + C(scenario)"
 }
oo=np.ones(df.shape[0])
model=sm.MixedLM.from_formula(fixed_formula, data=df, groups=oo, 
 vc_formula=vc_formula)
results=model.fit()
print(results.summary())
random_effects_by_channel = results.random_effects
print(random_effects_by_channel)
Dubbing answered 1/11, 2023 at 16:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.