I am trying to extract region features where class detection is higher than some threshold using the detectron2 framework. I will be using these features later in my pipeline (similar to: VilBert section 3.1 Training ViLBERT) So far I have trained a Mask R-CNN with this config and fine-tuned it on some custom data. It performs well. What I would like to do is extract the features from my trained model for the produced bounding box.
Why am I only getting one prediction instance, but when I look at the prediction CLS scores there are more than 1 which passes the threshold?
I believe this is the correct way of producing the ROI features:
images = ImageList.from_tensors(lst[:1], size_divisibility=32).to("cuda") # preprocessed input tensor
#setup config
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.SOLVER.IMS_PER_BATCH = 1
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 # only has one class (pnumonia)
#Just run these lines if you have the trained model im memory
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7 # set the testing threshold for this model
#build model
model = build_model(cfg)
DetectionCheckpointer(model).load("output/model_final.pth")
model.eval()#make sure its in eval mode
#run model
with torch.no_grad():
features = model.backbone(images.tensor.float())
proposals, _ = model.proposal_generator(images, features)
instances = model.roi_heads._forward_box(features, proposals)
Then
pred_boxes = [x.pred_boxes for x in instances]
rois = model.roi_heads.box_pooler([features[f] for f in model.roi_heads.in_features], pred_boxes)
This should be my ROI features.
What I am very confused about is instead of using the bounding boxes produced at inference I could use the proposals and the proposal_boxes with their class scores to get the top n features for this image. Cool so I have tried the following:
proposal_boxes = [x.proposal_boxes for x in proposals]
proposal_rois = model.roi_heads.box_pooler([features[f] for f in model.roi_heads.in_features], proposal_boxes)
#found here: https://detectron2.readthedocs.io/_modules/detectron2/modeling/roi_heads/roi_heads.html
box_features = model.roi_heads.box_head(proposal_rois)
predictions = model.roi_heads.box_predictor(box_features)
pred_instances, losses = model.roi_heads.box_predictor.inference(predictions, proposals)
Where I should be getting my proposal box features and its cls in my predictions object. Inspecting this predictions object I see the scores for each box:
CLS Scores in Predictions object
(tensor([[ 0.6308, -0.4926],
[-1.6662, 1.5430],
[-0.2080, 0.4856],
...,
[-6.9698, 6.6695],
[-5.6361, 5.4046],
[-4.4918, 4.3899]], device='cuda:0', grad_fn=<AddmmBackward>),
After softmaxing and placing these cls scores in a dataframe and setting a threshold of 0.6 I get:
pred_df = pd.DataFrame(predictions[0].softmax(-1).tolist())
pred_df[pred_df[0] > 0.6]
0 1
0 0.754618 0.245382
6 0.686816 0.313184
38 0.722627 0.277373
and in my predictions object I get the same top score, but only 1 instance rather than 2 (I set cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7
):
Prediction Instances:
[Instances(num_instances=1, image_height=800, image_width=800, fields=[pred_boxes: Boxes(tensor([[548.5992, 341.7193, 756.9728, 438.0507]], device='cuda:0',
grad_fn=<IndexBackward>)), scores: tensor([0.7546], device='cuda:0', grad_fn=<IndexBackward>), pred_classes: tensor([0], device='cuda:0')])]
The predictions also contain Tensor: Nx4 or Nx(Kx4) bounding box regression deltas. which I don't exactly know what they do and look like:
Bounding box regression deltas in Predictions object
tensor([[ 0.2502, 0.2461, -0.4559, -0.3304],
[-0.1359, -0.1563, -0.2821, 0.0557],
[ 0.7802, 0.5719, -1.0790, -1.3001],
...,
[-0.8594, 0.0632, 0.2024, -0.6000],
[-0.2020, -3.3195, 0.6745, 0.5456],
[-0.5542, 1.1727, 1.9679, -2.3912]], device='cuda:0',
grad_fn=<AddmmBackward>)
Something else strange is that my proposal boxes and my prediction boxes are different but similar:
Proposal bounding boxes
[Boxes(tensor([[532.9427, 335.8969, 761.2068, 438.8086],#this box vs the instance box
[102.7041, 352.5067, 329.4510, 440.7240],
[499.2719, 317.9529, 764.1958, 448.1386],
...,
[ 25.2890, 379.3329, 28.6030, 429.9694],
[127.1215, 392.6055, 328.6081, 489.0793],
[164.5633, 275.6021, 295.0134, 462.7395]], device='cuda:0'))]