I am training a svm classifier with cross validation (stratifiedKfold) using the scikits interfaces. For each test set (of k), I get a classification result. I want to have a confusion matrix with all the results. Scikits has a confusion matrix interface: sklearn.metrics.confusion_matrix(y_true, y_pred) My question is how should I accumulate the y_true and y_pred values. They are arrays (numpy). Should I define the size of the arrays based on my k-fold parameter? And for each result I should add the y_true and y-pred to the array ????
scikits confusion matrix with cross validation
Asked Answered
I got a solution for this problem. For each iteration( thru my k-fold) I create a "confusion matrix" that I add to the previous one. In this way I get a CN that contains all the values. With numpy it is easy to build this cumulative matrix (cm += cm) –
Pallaten
But I still have the problem if I want to get an accumulated report about precision/recall(classification_report). Each iteration will have an "y_true", y_pred". How do I get a final report? –
Pallaten
Throughout cross-validation, y_true will be constant. For y_pred, you can follow same procedure like confusion matrix. Take aggregated/total predictions. –
Barnebas
You can either use an aggregate confusion matrix or compute one for each CV partition and compute the mean and the standard deviation (or standard error) for each component in the matrix as a measure of the variability.
For the classification report, the code would need to be modified to accept 2 dimensional inputs so as to pass the predictions for each CV partitions and then compute the mean scores and std deviation for each class.
How would can you create aggregate confusion matrix? –
Haematoblast
© 2022 - 2024 — McMap. All rights reserved.