I am trying to run a simple logistic regression function. I have 4 columns named x1, x2, x3, and x4. x4 has a column that has only zeros and ones. So, I am using this as my dependent variable. To predict the dependent variable, I am using the independent variables x1, x2, and x3. Is my syntax off or how can I properly complete a logistic regression on my data while maintaining the R syntax that Statsmodels.api provides?
The following is my code:
import pandas as pd
import statsmodels.formula.api as smf
df = pd.DataFrame({'x1': [10, 11, 0, 14],
'x2': [12, 0, 1, 24],
'x3': [0, 65, 3, 2],
'x4': [0, 0, 1, 0]})
model = smf.logit(formula='x4 ~ x1 + x2 + x3', data=df).fit()
print(model)
The following is my error:
statsmodels.tools.sm_exceptions.PerfectSeparationError: Perfect separation detected, results not available
I understand what it means but I do not understand how I can avoid this issue. What values are needed to confirm a successful logistic regression algorithm and is my syntax correct and is there a better way to solve what I did (with the R syntax)?