How to do GridSearchCV for F1-score in classification problem with scikit-learn?

Asked 10/5, 2019 at 20:41 Answered 7/6, 2022 at 11:46

python machine-learning neural-network multilabel-classification hyperparameters

I'm working on a multi classification problem with a neural network in scikit-learn and I'm trying to figure out how I can optimize my hyperparameters (amount of layers, perceptrons, other things eventually).

I found out that GridSearchCV is the way to do it but the code that I'm using returns me the average accuracy while I actually want to test on the F1-score. Does anyone have an idea about how I can edit this code to make it work for the F1-score?

In the beginning when I had to evaluate the precision/accuracy I thought it was 'enough' to just take the confusion matrix and make a conclusion out of it, while doing trial-and-error changing the amount of layers and perceptrons in my neural network again and again.

Today I figured out that there's more than that: GridSearchCV. I just need to figure out how i can evaluate the F1-score because I need to do a research on determining the accuracy from the neural network in terms of the layers, nodes, and eventually other alternatives...

mlp = MLPClassifier(max_iter=600)
clf = GridSearchCV(mlp, parameter_space, n_jobs= -1, cv = 3)
clf.fit(X_train, y_train.values.ravel())

parameter_space = {
    'hidden_layer_sizes': [(1), (2), (3)],
}

print('Best parameters found:\n', clf.best_params_)

means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r" % (mean, std * 2, params))

output:

Best parameters found:
 {'hidden_layer_sizes': 3}
0.842 (+/-0.089) for {'hidden_layer_sizes': 1}
0.882 (+/-0.031) for {'hidden_layer_sizes': 2}
0.922 (+/-0.059) for {'hidden_layer_sizes': 3}

So here my output gives me the mean accuracy (which I found is default on GridSearchCV). How can I change this to return the average F1-score instead of accuracy?

Crassus answered 10/5, 2019 at 20:41 Comment(0)

You can create your own metric function with make_scorer. In this case, you can use sklearn's f1_score, but you can use your own if you prefer:

from sklearn.metrics import f1_score, make_scorer

f1 = make_scorer(f1_score , average='macro')

Once you have made your scorer, you can plug it directly inside the grid creation as scoring parameter:

clf = GridSearchCV(mlp, parameter_space, n_jobs= -1, cv = 3, scoring=f1)

On the other hand, I've used average='macro' as f1 multi-class parameter. This calculates the metrics for each label, and then finds their unweighted mean. But there are other options in order to compute f1 with multiple labels. You can find them here

Note: answer completely edited for better understanding

Visional answered 10/5, 2019 at 22:13 Comment(3)

Well it gave me this output:

ValueError: Sample-based precision, recall, fscore is not meaningful outside multilabel classification. See the accuracy_score instead.

– Crassus 11/5, 2019 at 13:38

Sorry, I didn't try it, I thought you could plug it directly in that way. I've updated my answer. It should work now. – Visional 11/5, 2019 at 23:27

Your idea is good for recall macro, but how you define it to be maximum and not minimum? – Picky 7/6, 2022 at 11:43

According to: https://scikit-learn.org/stable/modules/model_evaluation.html

You can simply write: scoring='f1_macro'

Picky answered 7/6, 2022 at 11:46 Comment(2)

It fails with ValueError: 'f1_score' is not a valid scoring value. Use sorted(sklearn.metrics.SCORERS.keys()) to get valid options. – Antlia 3/3, 2023 at 12:3

It's not f1_score but f1_macro – Picky 22/10, 2023 at 13:48

Recommended topics

Hot tags