splitting data into test and train, making a logistic regression model in pandas

About

Asked 23/3, 2015 at 22:46 Answered 23/3, 2015 at 23:47

Solved python pandas scikit-learn logistic-regression statsmodels

I'm trying to run this code: (credit goes to Greg)

import pandas as pd
from sklearn.model_selection import train_test_split
import statsmodels.api as sm

quality = pd.read_csv("https://courses.edx.org/c4x/MITx/15.071x/asset/quality.csv")
train, test = train_test_split(quality, train_size=0.75, random_state=1)

qualityTrain = pd.DataFrame(train, columns=quality.columns)
qualityTest = pd.DataFrame(test, columns=quality.columns)

qualityTrain['PoorCare'] = qualityTrain['PoorCare'].astype(int)

cols = ['OfficeVisits', 'Narcotics']
x = qualityTrain[cols]
x = sm.add_constant(x)
y = qualityTrain['PoorCare']

model = sm.Logit(y, x).fit()
model.summary()

But I'm getting:

AttributeError: 'int' object has no attribute 'exp'

on the second to last line. This is clearly introduced by sampling the data (train_test_split), because the model fits just fine on the whole unmodified dataset.

How to fix this?

Maurine answered 23/3, 2015 at 22:46 Comment(0)

Just convert the x variable to floats:

model = sm.Logit(y, x.astype(float)).fit()

I get the following result:

<class 'statsmodels.iolib.summary.Summary'>
"""
                           Logit Regression Results                           
==============================================================================
Dep. Variable:               PoorCare   No. Observations:                   98
Model:                          Logit   Df Residuals:                       95
Method:                           MLE   Df Model:                            2
Date:                Mon, 23 Mar 2015   Pseudo R-squ.:                  0.2390
Time:                        16:45:51   Log-Likelihood:                -39.714
converged:                       True   LL-Null:                       -52.188
                                        LLR p-value:                 3.823e-06
================================================================================
                   coef    std err          z      P>|z|      [95.0% Conf. Int.]
--------------------------------------------------------------------------------
const           -2.7718      0.561     -4.940      0.000        -3.872    -1.672
OfficeVisits     0.0680      0.031      2.211      0.027         0.008     0.128
Narcotics        0.1223      0.041      2.991      0.003         0.042     0.203
================================================================================
"""

Draggle answered 23/3, 2015 at 23:47 Comment(4)

Thanks. But it is strange that it's not capable of fitting to integer data, isn't it? – Maurine 24/3, 2015 at 10:52

running the example: train_test_split returns an array of dtype object. The master version of statsmodels raises now an exception if one of the arrays is an object dtype. – Weekday 24/3, 2015 at 14:35

Thanks for answers here. Quick question @josef — is there now a statsmodels (or pandas) native train/test split function out there? Easy enough to make my own, just curious if there's an "official" one. Thanks! – Gant 26/9, 2023 at 18:41

statsmodels does not have a train/test split function. AFAIK, neither does pandas. – Weekday 28/9, 2023 at 2:27

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags