How to resolve "IndexError: too many indices for array"
Asked Answered
S

4

8

My code below is giving me the following error "IndexError: too many indices for array". I am quite new to machine learning so I do not have any idea about how to solve this. Any kind of help would be appreciated.

train = pandas.read_csv("D:/...input/train.csv")


xTrain = train.iloc[:,0:54]
yTrain = train.iloc[:,54:]


from sklearn.cross_validation import cross_val_score
clf = LogisticRegression(multi_class='multinomial')
scores = cross_val_score(clf, xTrain, yTrain, cv=10, scoring='accuracy')
print('****Results****')
print(scores.mean())
Stamin answered 31/10, 2016 at 11:44 Comment(4)
Are you sure train looks like what you think it should?Hessney
@Hessney train.shape is giving me 15120 x 55 . 55 columns is what I am expectingStamin
Are you getting the error with the yTrain or the xTrain part?Hessney
@Hessney thanks . I was able to solve it by giving yTrain = train.target instead of yTrain = train.iloc[:,54:]Stamin
S
4

The error code you're getting is basically saying you've declared contents for your array that don't fit it. I can't see the declaration of your array but I'm assuming it's one dimensional and the program is objecting to you treating it like a 2 dimensional one.

Just check your declarations are correct and also test the code by printing the values after you've set them to double check they are what you intend them to be.

There are a few existing questions on this subject already so i'll just link one that might be helpful here: IndexError: too many indices. Numpy Array with 1 row and 2 columns

Scarborough answered 31/10, 2016 at 12:1 Comment(1)
I understood what the problem is. The number of columns is getting mismatched somewhere but train.shape is giving me 15120 x 55 . 55 columns is what I am expecting.Stamin
S
7

Step by Step Explanation of ML (Machine Learning) Code with Pandas Dataframe :

  1. Seperating Predictor and Target Columns into X and y Respectively.

  2. Splitting Training data (X_train,y_train) and Testing Data (X_test,y_test).

  3. Calculating Cross-Validated AUC (Area Under the Curve). Got an Error “IndexError: too many indices for array” due to y_train since it was expecting a 1-D Array but Fetched 2-D Array which is a Mismatch. After Replacing the code 'y_train' with y_train['y'] code worked like a Charm.


   # Importing Packages :

   import pandas as pd

   from sklearn.model_selection import cross_val_score

   from sklearn.model_selection import StratifiedShuffleSplit

   # Seperating Predictor and Target Columns into X and y Respectively :
   # df -> Dataframe extracted from CSV File

   data_X = df.drop(['y'], axis=1) 
   data_y = pd.DataFrame(df['y'])

   # Making a Stratified Shuffle Split of Train and Test Data (test_size=0.3 Denotes 30 % Test Data and Remaining 70% Train Data) :

   rs = StratifiedShuffleSplit(n_splits=2, test_size=0.3,random_state=2)       
   rs.get_n_splits(data_X,data_y)

   for train_index, test_index in rs.split(data_X,data_y):

       # Splitting Training and Testing Data based on Index Values :

       X_train,X_test = data_X.iloc[train_index], data_X.iloc[test_index]
       y_train,y_test = data_y.iloc[train_index], data_y.iloc[test_index]

       # Calculating 5-Fold Cross-Validated AUC (cv=5) - Error occurs due to Dimension of **y_train** in this Line :

       classify_cross_val_score = cross_val_score(classify, X_train, y_train, cv=5, scoring='roc_auc').mean()

       print("Classify_Cross_Val_Score ",classify_cross_val_score) # Error at Previous Line.

       # Worked after Replacing 'y_train' with y_train['y'] in above Line 
       # where y is the ONLY Column (or) Series Present in the Pandas Data frame 
       # (i.e) Target variable for Prediction :

       classify_cross_val_score = cross_val_score(classify, X_train, y_train['y'], cv=5, scoring='roc_auc').mean()

       print("Classify_Cross_Val_Score ",classify_cross_val_score)

       print(y_train.shape)

       print(y_train['y'].shape)

Output :

    Classify_Cross_Val_Score  0.7021433588790991
    (31647, 1) # 2-D
    (31647,)   # 1-D

Note : from sklearn.model_selection import cross_val_score. cross_val_score has been imported from sklearn.model_selection and NOT from sklearn.cross_validation which is Deprecated.

Sulphathiazole answered 21/12, 2018 at 16:41 Comment(1)
Hi @MatthewStrawbridge, thanks for mentioning i have edited my answer. Hope it is clear and helpful to those who face the same error while Building ML Models in Python using Pandas Dataframe and Cross Validate the Training Set :-)Sulphathiazole
S
4

The error code you're getting is basically saying you've declared contents for your array that don't fit it. I can't see the declaration of your array but I'm assuming it's one dimensional and the program is objecting to you treating it like a 2 dimensional one.

Just check your declarations are correct and also test the code by printing the values after you've set them to double check they are what you intend them to be.

There are a few existing questions on this subject already so i'll just link one that might be helpful here: IndexError: too many indices. Numpy Array with 1 row and 2 columns

Scarborough answered 31/10, 2016 at 12:1 Comment(1)
I understood what the problem is. The number of columns is getting mismatched somewhere but train.shape is giving me 15120 x 55 . 55 columns is what I am expecting.Stamin
S
3

You are getting this error because you are making target array 'y' 2-D which is actually needed to be 1-D to pass in cross validation function.

These two cases are different:

1. y=numpy.zeros(shape=(len(list),1))
2. y=numpy.zeros(shape=(len(list))) 

If you declare y like case 1 then y becomes 2-D. But you needed a 1-D array, hence, use case 2.

Scapula answered 13/10, 2017 at 22:29 Comment(0)
P
0

While importing dataset and printing out with Matplotlib I could preview image with images[5540,:] where 5540 is id of image but while printing label for that image with labels[5540,:] it threw an error like too many Index values.

I found out that labels is only 1D array while I'm trying to print is 2D array so there are less index to return for this statement so it was throwing error.

Solution which worked for me was labels[5540,].

Praiseworthy answered 13/10, 2019 at 20:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.