I'm trying to run this code: (credit goes to Greg)
import pandas as pd
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
quality = pd.read_csv("https://courses.edx.org/c4x/MITx/15.071x/asset/quality.csv")
train, test = train_test_split(quality, train_size=0.75, random_state=1)
qualityTrain = pd.DataFrame(train, columns=quality.columns)
qualityTest = pd.DataFrame(test, columns=quality.columns)
qualityTrain['PoorCare'] = qualityTrain['PoorCare'].astype(int)
cols = ['OfficeVisits', 'Narcotics']
x = qualityTrain[cols]
x = sm.add_constant(x)
y = qualityTrain['PoorCare']
model = sm.Logit(y, x).fit()
model.summary()
But I'm getting:
AttributeError: 'int' object has no attribute 'exp'
on the second to last line. This is clearly introduced by sampling the data (train_test_split), because the model fits just fine on the whole unmodified dataset.
How to fix this?