Good ROC curve but poor precision-recall curve

About

Asked 23/10, 2015 at 3:49 Answered 23/10, 2015 at 7:40

Solved machine-learning scikit-learn performance-testing roc precision-recall

I have some machine learning results that I don't quite understand. I am using python sciki-learn, with 2+ million data of about 14 features. The classification of 'ab' looks pretty bad on the precision-recall curve, but the ROC for Ab looks just as good as most other groups' classification. What can explain that?

Took answered 23/10, 2015 at 3:49 Comment(3)

Is your set balanced? (ie. as many ab as non-ab) – Rill 23/10, 2015 at 7:5

No it's very unbalanced, Ab is less than 2% – Took 23/10, 2015 at 7:21

Here you go. Try oversampling to mitigate the issue. – Rill 23/10, 2015 at 7:41

Class imbalance.

Unlike the ROC curve, PR curves are very sensitive to imbalance. If you optimize your classifier for good AUC on an unbalanced data you are likely to obtain poor precision-recall results.

Rill answered 23/10, 2015 at 7:40 Comment(4)

I see, but what does it really mean in terms of the performance of the test? Is it good (based on ROC) or bad (based on P-R)? How can a test be good if in the above P-R curve that the best it can do is 40% for both precision and recall? – Took 23/10, 2015 at 13:4

It means that you have to be careful when you report the performance of a test on unbalanced data. In medical applications it can have a terrible impact (see AIDS testing as a textbook case), in others it can be fine, it really depends on your specific application. – Rill 23/10, 2015 at 22:35

I didn't tweak the default setting as I am using scikit learn, but like you said it seems to optimize based on AUC, is there a way to optimize based on Precision/recall pair in unbalanced data? – Took 25/10, 2015 at 3:58

You should post this as a new question. – Rill 26/10, 2015 at 6:41

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags