Pymer4 for logistic mixed effects regression. The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Asked Answered
L

1

1

I have a dataset for one year for all employees with individual-level data (e.g. age, gender, promotions, etc.). Each employee is in a team of a certain manager. I have some variables on the team- and manager-levels as well (e.g. manager's tenure, team diversity, etc.). I want to explain the termination of employees (binary: left the company or not). I am running a multilevel logistic regression, where employees are grouped by their managers, therefore they share the same team- and manager-level characteristics.

So, my model looks like:

Termination ~ Age + Time in company + Promotions + Manager tenure + Manager age + Average age in team + % of women in team", data, groups=data[Manager_ID]

Dataset example:

data = {'Employee': ['ID1', 'ID2','ID3','ID4','ID5','ID6','ID7', 'ID8'],
'Manager_ID': ['MID1', 'MID2','MID2','MID1','MID3','MID3','MID3', 'MID1'],
'Termination': ['0', '0', '0', '0', '1', '1', '1', '0'],
'Age': ['35', '40','50','24','33','46','44', '31'],
'TimeinCompany': ['1', '3', '10', '20', '4', '0', '4', '9'],
'Promotions': ['1', '0', '0', '0', '1', '1', '1', '0'],
'Manager_Tenure': ['10', '5', '5', '10', '8', '8', '8', '10'],
'Manager_Age': ['40', '45', '45', '40', '38', '38', '38', '40'],
'AverageAgeTeam': ['33', '30', '30', '33', '44', '44', '44', '33'],
'PercentWomenTeam': ['40', '20', '20', '40', '49', '49', '49', '40']}

columns = ['Employee','Manager_ID','Age', 'TimeinCompany', 'Promotions', 'Manager_Tenure', 'Manager_Age', 'AverageAgeTeam', 'PercentWomenTeam']

data = pd.DataFrame(data, columns=columns)

I am using pymer4 package to run logistic mixed effects regression (lmer from R) in Python.

from pymer4.models import Lmer

building model

model = Lmer("Termination  ~ Age  + TimeinCompany + Promotions + Manager_Tenure + Manager_Age + AverageAgeTeam + PercentWomenTeam + (1|Manager_ID)",
             data=data, family = 'binomial')

print(model.fit())

However, I receive an error "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()".

I thought it is due to some managers only having 1 employee in the dataset. I excluded managers who have less than 5 / 20 / 50 employees, e.g.:

data['Count'] = data.groupby('Manager_ID')["Employee"].transform("count")

data1 = data[data['Count']>=50]

but the error message is the same.

I also tried transforming all variables into numeric:

all_columns = list(data)

data[all_columns] = data[all_columns].astype(np.int64, errors='ignore')

Some variables are now int64, while others are float64. The error message is still the same.

The dataset is also biased towards employees who did not leave the company, so Termination variable has more 0 than 1. Model also runs for a long time on the full sample before showing the error message.

Lacerated answered 2/8, 2022 at 9:23 Comment(0)
L
0

I ran into the same error and could resolve it by updating my pymer4 version (see also this issue on github)

conda install -c ejolly -c conda-forge -c defaults pymer4=0.7.8

Hope this helps!

Edit: In case you have a pandas version >= 1.2 in your environment, you'll need to install pymer4 0.8.0 from the pre-release channel:

conda install -c ejolly/label/pre-release -c conda-forge -c defaults pymer4=0.8.0
Lifetime answered 19/12, 2022 at 8:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.