After hours of research, I could not find any example on multi-label predictions with object detection API. Basically I would like to predict more than one label per instance in an image. As the image shown below:
I would like to predict clothing categories, but also the attributes such as color and pattern.
From my understanding, I need to attach more classification head per each attribute to the 2nd stage ROI feature map, and sums each attribute's loss? However, I have trouble implement this in the object detection code. Can somebody give me some tips on which functions should I start to modify? Thank you.