I'm using scikit to perform a logistic regression on spam/ham data. X_train is my training data and y_train the labels('spam' or 'ham') and I trained my LogisticRegression this way:
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
If I want to get the accuracies for a 10 fold cross validation, I just write:
accuracy = cross_val_score(classifier, X_train, y_train, cv=10)
I thought it was possible to calculate also the precisions and recalls by simply adding one parameter this way:
precision = cross_val_score(classifier, X_train, y_train, cv=10, scoring='precision')
recall = cross_val_score(classifier, X_train, y_train, cv=10, scoring='recall')
But it results in a ValueError
:
ValueError: pos_label=1 is not a valid label: array(['ham', 'spam'], dtype='|S4')
Is it related to the data (should I binarize the labels ?) or do they change the cross_val_score
function ?
Thank you in advance !