While using statsmodels, I am getting this weird error: ValueError: endog must be in the unit interval.
Can someone give me more information on this error? Google is not helping.
Code that produced the error:
"""
Multiple regression with dummy variables.
"""
import pandas as pd
import statsmodels.api as sm
import pylab as pl
import numpy as np
df = pd.read_csv('cost_data.csv')
df.columns = ['Cost', 'R(t)', 'Day of Week']
dummy_ranks = pd.get_dummies(df['Day of Week'], prefix='days')
cols_to_keep = ['Cost', 'R(t)']
data = df[cols_to_keep].join(dummy_ranks.ix[:,'days_2':])
data['intercept'] = 1.0
print(data)
train_cols = data.columns[1:]
logit = sm.Logit(data['Cost'], data[train_cols])
result = logit.fit()
print(result.summary())
And the traceback:
Traceback (most recent call last):
File "multiple_regression_dummy.py", line 20, in <module>
logit = sm.Logit(data['Cost'], data[train_cols])
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/statsmodels/discrete/discrete_model.py", line 404, in __init__
raise ValueError("endog must be in the unit interval.")
ValueError: endog must be in the unit interval.
Cost
data? Logit requires that the dependent variable (endog) is in the unit interval. If you want logistic regression with values in another interval, then you need to transform your values so that they are in the the unit interval. However, Logit does not require that theendog
are 0, 1 integers, so we can use it for proportions. – HawkinsonCost
is not in the unit interval. Any idea why Logit requires this? – Kirakiran