How to measure xgboost regressor accuracy using accuracy_score (or other suggested function)

I'm making a code to solve a simple problem of predict the probability of an item missing from an inventory.

I'm using the XGBoost prediction model to do this.

I have the data split in two .csv files, one with the Train Data and other with the Test Data

Here is the code:

    import pandas as pd
    import numpy as np


    train = pd.read_csv('C:/Users/pedro/Documents/Pedro/UFMG/8o periodo/Python/Trabalho Final/train.csv', index_col='sku').fillna(-1)
    test = pd.read_csv('C:/Users/pedro/Documents/Pedro/UFMG/8o periodo/Python/Trabalho Final/test.csv', index_col='sku').fillna(-1)


    X_train, y_train = train.drop('isBackorder', axis=1), train['isBackorder']

    import xgboost as xgb
    xg_reg = xgb.XGBRegressor(objective ='reg:linear', colsample_bytree = 0.3, learning_rate = 0.1,
                    max_depth = 10, alpha = 10, n_estimators = 10)
    xg_reg.fit(X_train,y_train)


    y_pred = xg_reg.predict(test)

    # Create file for the competition submission
    test['isBackorder'] = y_pred
    pred = test['isBackorder'].reset_index()
    pred.to_csv('competitionsubmission.csv',index=False)

And here is the functions where i try to measure the accuracy of the problem (Using RMSE and the accuracy_scores function and do a KFold cross validation

#RMSE
from sklearn.metrics import mean_squared_error

rmse = np.sqrt(mean_squared_error(y_train, y_pred))
print("RMSE: %f" % (rmse))


#Accuracy
from sklearn.metrics import accuracy_score

# make predictions for test data
predictions = [round(value) for value in y_pred]

# evaluate predictions
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: %.2f%%" % (accuracy * 100.0))


#KFold
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

# CV model
kfold = KFold(n_splits=10, random_state=7)
results = cross_val_score(xg_reg, X_train, y_train, cv=kfold)
print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

But i'm having some problems.

None of the accuracy test above works.

When using the RMSE function and the Accuracy function, the following error appears: ValueError: Found input variables with inconsistent numbers of samples: [1350955, 578982]

I guess that the Train and Test Data split structure that i'm using are not correct.

Since i don't have a y_test (and i don't know how to create it in my problem), i can't use it at the function's above parameters.

The K Fold validation isn't working too.

Can someone help me PLEASE?

import pandas as pd import numpy as np from sklearn.model_selection import train_test_split import xgboost as xgb from sklearn.metrics import mean_squared_error from sklearn.model_selection import KFold from sklearn.model_selection import cross_val_score train = pd.read_csv('C:/Users/pedro/Documents/Pedro/UFMG/8o periodo/Python/Trabalho Final/train.csv', index_col='sku').fillna(-1) test_data = pd.read_csv('C:/Users/pedro/Documents/Pedro/UFMG/8o ' 'periodo/Python/Trabalho Final/test.csv', index_col='sku').fillna(-1) x, y = train.drop('isBackorder', axis=1), train['isBackorder'] X_train, X_test, y_train, y_test = train_test_split(x, y) xg_reg = xgb.XGBRegressor(objective ='reg:linear', colsample_bytree = 0.3, learning_rate = 0.1, max_depth = 10, alpha = 10, n_estimators = 10) xg_reg.fit(X_train,y_train) kfold = KFold(n_splits=10, random_state=7) results = cross_val_score(xg_reg, X_train, y_train, cv=kfold) y_test_pred = xg_reg.predict(X_test) mse = mean_squared_error(y_test_pred, y_test) y_pred = xg_reg.predict(X_test) pd.DataFrame(y_pred).to_csv('competitionsubmission.csv',index=False)

Recommended topics

Hot tags