I started using the cocoapi to evaluate a model trained using the Object Detection API. After reading various sources that explain mean average precision (mAP) and recall, I am confused with the "maximum detections" paramter used in the cocoapi.
From what I understood (e.g. here, here or here), one calculates mAP by calculating precision and recall for various model score thresholds. This gives the precision-recall curve and mAP is calculated as an approximation to the area under this curve. Or, expressed differently, as the average of the maximum precision in defined recall ranges (0:0.1:1).
However, the cocoapi seems to calculate precision and recall for a given number of maximum detections (maxDet
) with the highest scores. And from there get the precision-recall curve for maxDets = 1, 10, 100
. Why is this a good metric since it is clearly not the same as the above method (it potentially excludes datapoints)?
In my example, I have ~ 3000 objects per image. Evaluating the result using the cocoapi gives terrible recall because it limits the number of detected objects to 100.
For testing purposes, I feed the evaluation dataset as the ground truth and the detected objects (with some artificial scores). I would expect precision and recall pretty good, which is actually happening. But as soon as I feed in more than 100 objects, precision and recall go down with increasing number of "detected objects". Even though they are all "correct"! How does that make sense?