Getting ValueError: The indices for endog and exog are not aligned
Asked Answered
P

7

11

I am getting above error when I am running an iteration using FOR loop to build multiple models. First two models having similar data sets build fine. While building third model I am getting this error. The code where error is thrown is when I call sm.logit() using Statsmodel package of python:

y = y_mort.convert_objects(convert_numeric=True)

#Building Logistic model_LSVC
print("Shape of y:", y.shape, " &&Shape of X_selected_lsvc:", X.shape)
print("y values:",y.head())
logit = sm.Logit(y,X,missing='drop') 

The error that appears:

Shape of y: (9018,)  &&Shape of X_selected_lsvc: (9018, 59)
y values: 0    0
1    1
2    0
3    0
4    0
Name: mort, dtype: int64
ValueError                                Traceback (most recent call last)
<ipython-input-8-fec746e2ee99> in <module>()
    160     print("Shape of y:", y.shape, " &&Shape of X_selected_lsvc:", X.shape)
    161     print("y values:",y.head())
--> 162     logit = sm.Logit(y,X,missing='drop')
    163     # fit the model
    164     est = logit.fit(method='cg')

D:\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in __init__(self, endog, exog, **kwargs)
    399 
    400     def __init__(self, endog, exog, **kwargs):
--> 401         super(BinaryModel, self).__init__(endog, exog, **kwargs)
    402         if (self.__class__.__name__ != 'MNLogit' and
    403                 not np.all((self.endog >= 0) & (self.endog <= 1))):

D:\Anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py in __init__(self, endog, exog, **kwargs)
    152     """
    153     def __init__(self, endog, exog, **kwargs):
--> 154         super(DiscreteModel, self).__init__(endog, exog, **kwargs)
    155         self.raise_on_perfect_prediction = True
    156 

D:\Anaconda3\lib\site-packages\statsmodels\base\model.py in __init__(self, endog, exog, **kwargs)
    184 
    185     def __init__(self, endog, exog=None, **kwargs):
--> 186         super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
    187         self.initialize()
    188 

D:\Anaconda3\lib\site-packages\statsmodels\base\model.py in __init__(self, endog, exog, **kwargs)
     58         hasconst = kwargs.pop('hasconst', None)
     59         self.data = self._handle_data(endog, exog, missing, hasconst,
---> 60                                       **kwargs)
     61         self.k_constant = self.data.k_constant
     62         self.exog = self.data.exog

D:\Anaconda3\lib\site-packages\statsmodels\base\model.py in _handle_data(self, endog, exog, missing, hasconst, **kwargs)
     82 
     83     def _handle_data(self, endog, exog, missing, hasconst, **kwargs):
---> 84         data = handle_data(endog, exog, missing, hasconst, **kwargs)
     85         # kwargs arrays could have changed, easier to just attach here
     86         for key in kwargs:

D:\Anaconda3\lib\site-packages\statsmodels\base\data.py in handle_data(endog, exog, missing, hasconst, **kwargs)
    564     klass = handle_data_class_factory(endog, exog)
    565     return klass(endog, exog=exog, missing=missing, hasconst=hasconst,
--> 566                  **kwargs)

D:\Anaconda3\lib\site-packages\statsmodels\base\data.py in __init__(self, endog, exog, missing, hasconst, **kwargs)
     74         # this has side-effects, attaches k_constant and const_idx
     75         self._handle_constant(hasconst)
---> 76         self._check_integrity()
     77         self._cache = resettable_cache()
     78 

D:\Anaconda3\lib\site-packages\statsmodels\base\data.py in _check_integrity(self)
    450                 (hasattr(endog, 'index') and hasattr(exog, 'index')) and
    451                 not self.orig_endog.index.equals(self.orig_exog.index)):
--> 452             raise ValueError("The indices for endog and exog are not aligned")
    453         super(PandasData, self)._check_integrity()
    454 

ValueError: The indices for endog and exog are not aligned

The y matrix and X matrix have shape of (9018,),(9018, 59). Therefore any mismatch in dependent and independent variables doesn't appear. Any idea?

Procathedral answered 10/5, 2016 at 17:12 Comment(4)
The error message indicates that you have endog and exog with different indices in pandas Series and DataFrame. So, it's complaining that row indices are not compatible. As a check you can also replace, y and X by numpy arrays in the Logit call and see whether the data itself is fine.Whitefly
@user333700 : I have same number of rows in y & X as printed at top of log above. So what do you mean by row indices are not compatible? How can I feed y & X as numpy array? I used y.as_matrix() & X.as_matrix and fed to logit method. No error but no data came in output.Procathedral
Based on the exception, your y is a pandas Series that has an index. X is a pandas DataFrame that has an index. Those two indexes don't match up, that's my guess. I don't think it's a shape/number of rows issue. You can convert pandas object to numpy array with asarray, Logit(np.asarray(y) np.asarray(X), ... (I guess as_matrix converts to numpy matrix not array. IIRC, numpy matrix is not supported by statsmodels, and it's use is strongly discouraged.)Whitefly
Make y_train a 2-D array using the command y_train = y_train.values.reshape(-1,1)Denney
U
21

Try converting y into a list before the sm.Logit() line.

y = list(y)
Unriddle answered 18/8, 2016 at 15:11 Comment(0)
O
4

The error message indicates that you have endog and exog with different shape. This is common error in python which can be easily solved by using 'reshape' function on dependent variable to align it with independent variable's shape.

y_train.values.reshape(-1,1)

Above lines means:- We have provided column as 1 but rows as unknown i.e. we got a single column with as many rows as X.

Lets take a example:-

z = np.array([[1, 2], [ 3, 4]])
print(z.shape)    # (2, 2)

Now we will use reshape(-1,1) function on this array. We can see new array has 4 row and 1 column.

new_z= z.reshape(-1,1)
print(new_z)        #array([[1],[2],[3], [4]])
print(new_z.shape)  #(4, 1)
Openwork answered 12/5, 2020 at 8:30 Comment(0)
E
3

This error may also come due to wrong usage of API

Correct:

X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=0.7, test_size=0.3, random_state=100
) 

Incorrect:

X_train, y_train, X_test, y_test = train_test_split(
    X, y, train_size=0.7, test_size=0.3, random_state=100
)
Epa answered 29/4, 2021 at 16:12 Comment(0)
S
2

It may be due to different indices in x and y. This may happen when we initially removed some values from dataframe and perform some operations on x after separating x and y. The indices in y will contain the missing indices from original dataframe while x will have continuous indices. It's best to do dataframe.reset_index(drop = True) before separating x and y.

Sutherland answered 10/11, 2020 at 6:12 Comment(0)
H
0

Have you checked if you have Nan in your data? You can use np.isNan(X) and np.isNan(y). I saw you turned on the option drop so I suspect if you have Nan in your data then that will change the shape of your input.

Harvin answered 10/5, 2016 at 18:30 Comment(1)
There is no NaN. In the dataframe we have all binary values 0 or 1 in both variables: dependent and independents.Procathedral
G
0

do y_train.values.ravel(). Actually shape of y_train is in 2D array. So you need to convert it into 1D array. hope it works for you.

Goldarned answered 27/6, 2021 at 11:3 Comment(0)
I
0

ValueError: The indices for endog and exog are not aligned

Above error is basically due to index mismatch in both X & y datasets while cleaning and preparation.

I removed this error by removing the indices of both X & y datasets as: y_train = y_train.reset_index(drop=True) X_train = X_train.reset_index(drop=True)

Pls provide your valuable feedback

Inject answered 7/9, 2022 at 6:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.