Step by Step Explanation of ML (Machine Learning) Code with Pandas Dataframe :
Seperating Predictor and Target Columns into X and y Respectively.
Splitting Training data (X_train,y_train) and Testing Data (X_test,y_test).
Calculating Cross-Validated AUC (Area Under the Curve). Got an Error “IndexError: too many indices for array” due to y_train since it was expecting a 1-D Array but Fetched 2-D Array which is a Mismatch. After Replacing the code 'y_train' with y_train['y'] code worked like a Charm.
# Importing Packages :
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedShuffleSplit
# Seperating Predictor and Target Columns into X and y Respectively :
# df -> Dataframe extracted from CSV File
data_X = df.drop(['y'], axis=1)
data_y = pd.DataFrame(df['y'])
# Making a Stratified Shuffle Split of Train and Test Data (test_size=0.3 Denotes 30 % Test Data and Remaining 70% Train Data) :
rs = StratifiedShuffleSplit(n_splits=2, test_size=0.3,random_state=2)
rs.get_n_splits(data_X,data_y)
for train_index, test_index in rs.split(data_X,data_y):
# Splitting Training and Testing Data based on Index Values :
X_train,X_test = data_X.iloc[train_index], data_X.iloc[test_index]
y_train,y_test = data_y.iloc[train_index], data_y.iloc[test_index]
# Calculating 5-Fold Cross-Validated AUC (cv=5) - Error occurs due to Dimension of **y_train** in this Line :
classify_cross_val_score = cross_val_score(classify, X_train, y_train, cv=5, scoring='roc_auc').mean()
print("Classify_Cross_Val_Score ",classify_cross_val_score) # Error at Previous Line.
# Worked after Replacing 'y_train' with y_train['y'] in above Line
# where y is the ONLY Column (or) Series Present in the Pandas Data frame
# (i.e) Target variable for Prediction :
classify_cross_val_score = cross_val_score(classify, X_train, y_train['y'], cv=5, scoring='roc_auc').mean()
print("Classify_Cross_Val_Score ",classify_cross_val_score)
print(y_train.shape)
print(y_train['y'].shape)
Output :
Classify_Cross_Val_Score 0.7021433588790991
(31647, 1) # 2-D
(31647,) # 1-D
Note : from sklearn.model_selection import cross_val_score.
cross_val_score has been imported
from sklearn.model_selection and
NOT from sklearn.cross_validation which is Deprecated.
train
looks like what you think it should? – HessneyyTrain
or thexTrain
part? – Hessney