I have a dataset for one year for all employees with individual-level data (e.g. age, gender, promotions, etc.). Each employee is in a team of a certain manager. I have some variables on the team- and manager-levels as well (e.g. manager's tenure, team diversity, etc.). I want to explain the termination of employees (binary: left the company or not). I am running a multilevel logistic regression, where employees are grouped by their managers, therefore they share the same team- and manager-level characteristics.
So, my model looks like:
Termination ~ Age + Time in company + Promotions + Manager tenure + Manager age + Average age in team + % of women in team", data, groups=data[Manager_ID]
Dataset example:
data = {'Employee': ['ID1', 'ID2','ID3','ID4','ID5','ID6','ID7', 'ID8'],
'Manager_ID': ['MID1', 'MID2','MID2','MID1','MID3','MID3','MID3', 'MID1'],
'Termination': ['0', '0', '0', '0', '1', '1', '1', '0'],
'Age': ['35', '40','50','24','33','46','44', '31'],
'TimeinCompany': ['1', '3', '10', '20', '4', '0', '4', '9'],
'Promotions': ['1', '0', '0', '0', '1', '1', '1', '0'],
'Manager_Tenure': ['10', '5', '5', '10', '8', '8', '8', '10'],
'Manager_Age': ['40', '45', '45', '40', '38', '38', '38', '40'],
'AverageAgeTeam': ['33', '30', '30', '33', '44', '44', '44', '33'],
'PercentWomenTeam': ['40', '20', '20', '40', '49', '49', '49', '40']}
columns = ['Employee','Manager_ID','Age', 'TimeinCompany', 'Promotions', 'Manager_Tenure', 'Manager_Age', 'AverageAgeTeam', 'PercentWomenTeam']
data = pd.DataFrame(data, columns=columns)
I am using pymer4 package to run logistic mixed effects regression (lmer from R) in Python.
from pymer4.models import Lmer
building model
model = Lmer("Termination ~ Age + TimeinCompany + Promotions + Manager_Tenure + Manager_Age + AverageAgeTeam + PercentWomenTeam + (1|Manager_ID)",
data=data, family = 'binomial')
print(model.fit())
However, I receive an error "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()".
I thought it is due to some managers only having 1 employee in the dataset. I excluded managers who have less than 5 / 20 / 50 employees, e.g.:
data['Count'] = data.groupby('Manager_ID')["Employee"].transform("count")
data1 = data[data['Count']>=50]
but the error message is the same.
I also tried transforming all variables into numeric:
all_columns = list(data)
data[all_columns] = data[all_columns].astype(np.int64, errors='ignore')
Some variables are now int64, while others are float64. The error message is still the same.
The dataset is also biased towards employees who did not leave the company, so Termination variable has more 0 than 1. Model also runs for a long time on the full sample before showing the error message.