Detectron2 - Extract region features at a threshold for object detection
Asked Answered
A

1

5

I am trying to extract region features where class detection is higher than some threshold using the detectron2 framework. I will be using these features later in my pipeline (similar to: VilBert section 3.1 Training ViLBERT) So far I have trained a Mask R-CNN with this config and fine-tuned it on some custom data. It performs well. What I would like to do is extract the features from my trained model for the produced bounding box.

Why am I only getting one prediction instance, but when I look at the prediction CLS scores there are more than 1 which passes the threshold?

I believe this is the correct way of producing the ROI features:

images = ImageList.from_tensors(lst[:1], size_divisibility=32).to("cuda")  # preprocessed input tensor
#setup config
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.SOLVER.IMS_PER_BATCH = 1
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (pnumonia)
#Just run these lines if you have the trained model im memory
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set the testing threshold for this model
#build model
model = build_model(cfg)
DetectionCheckpointer(model).load("output/model_final.pth")
model.eval()#make sure its in eval mode

#run model
with torch.no_grad():
    features = model.backbone(images.tensor.float())
    proposals, _ = model.proposal_generator(images, features)
    instances = model.roi_heads._forward_box(features, proposals)

Then

pred_boxes = [x.pred_boxes for x in instances]
rois = model.roi_heads.box_pooler([features[f] for f in model.roi_heads.in_features], pred_boxes)

This should be my ROI features.

What I am very confused about is instead of using the bounding boxes produced at inference I could use the proposals and the proposal_boxes with their class scores to get the top n features for this image. Cool so I have tried the following:

proposal_boxes = [x.proposal_boxes for x in proposals]
proposal_rois = model.roi_heads.box_pooler([features[f] for f in model.roi_heads.in_features], proposal_boxes)
#found here: https://detectron2.readthedocs.io/_modules/detectron2/modeling/roi_heads/roi_heads.html
box_features = model.roi_heads.box_head(proposal_rois)
predictions = model.roi_heads.box_predictor(box_features)
pred_instances, losses = model.roi_heads.box_predictor.inference(predictions, proposals)

Where I should be getting my proposal box features and its cls in my predictions object. Inspecting this predictions object I see the scores for each box:

CLS Scores in Predictions object

(tensor([[ 0.6308, -0.4926],
         [-1.6662,  1.5430],
         [-0.2080,  0.4856],
         ...,
         [-6.9698,  6.6695],
         [-5.6361,  5.4046],
         [-4.4918,  4.3899]], device='cuda:0', grad_fn=<AddmmBackward>),

After softmaxing and placing these cls scores in a dataframe and setting a threshold of 0.6 I get:

pred_df = pd.DataFrame(predictions[0].softmax(-1).tolist())
pred_df[pred_df[0] > 0.6]
    0           1
0   0.754618    0.245382
6   0.686816    0.313184
38  0.722627    0.277373

and in my predictions object I get the same top score, but only 1 instance rather than 2 (I set cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7):

Prediction Instances:

[Instances(num_instances=1, image_height=800, image_width=800, fields=[pred_boxes: Boxes(tensor([[548.5992, 341.7193, 756.9728, 438.0507]], device='cuda:0',
        grad_fn=<IndexBackward>)), scores: tensor([0.7546], device='cuda:0', grad_fn=<IndexBackward>), pred_classes: tensor([0], device='cuda:0')])]

The predictions also contain Tensor: Nx4 or Nx(Kx4) bounding box regression deltas. which I don't exactly know what they do and look like:

Bounding box regression deltas in Predictions object

tensor([[ 0.2502,  0.2461, -0.4559, -0.3304],
        [-0.1359, -0.1563, -0.2821,  0.0557],
        [ 0.7802,  0.5719, -1.0790, -1.3001],
        ...,
        [-0.8594,  0.0632,  0.2024, -0.6000],
        [-0.2020, -3.3195,  0.6745,  0.5456],
        [-0.5542,  1.1727,  1.9679, -2.3912]], device='cuda:0',
       grad_fn=<AddmmBackward>)

Something else strange is that my proposal boxes and my prediction boxes are different but similar:

Proposal bounding boxes

[Boxes(tensor([[532.9427, 335.8969, 761.2068, 438.8086],#this box vs the instance box
         [102.7041, 352.5067, 329.4510, 440.7240],
         [499.2719, 317.9529, 764.1958, 448.1386],
         ...,
         [ 25.2890, 379.3329,  28.6030, 429.9694],
         [127.1215, 392.6055, 328.6081, 489.0793],
         [164.5633, 275.6021, 295.0134, 462.7395]], device='cuda:0'))]
Ardeb answered 18/6, 2020 at 3:47 Comment(0)
T
8

You are almost there. Looking at roi_heads.box_predictor.inference() you will see that it doesn't simply sort the scores of the box candidates. First, it applies box deltas to readjust the proposal boxes. Then it computes Non-Maximum Suppression to remove non-overlapping boxes (while also applying other hyper-settings such as score threshold). Finally, it ranks top-k boxes according to their scores. That probably explains why your method produces the same box scores but different number of output boxes and its coordinates.

Back to your original question, here is the way to extract the features of the proposed boxes in one inference pass:

image = cv2.imread('my_image.jpg')
height, width = image.shape[:2]
image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1))
inputs = [{"image": image, "height": height, "width": width}]
with torch.no_grad():
    images = model.preprocess_image(inputs)  # don't forget to preprocess
    features = model.backbone(images.tensor)  # set of cnn features
    proposals, _ = model.proposal_generator(images, features, None)  # RPN

    features_ = [features[f] for f in model.roi_heads.box_in_features]
    box_features = model.roi_heads.box_pooler(features_, [x.proposal_boxes for x in proposals])
    box_features = model.roi_heads.box_head(box_features)  # features of all 1k candidates
    predictions = model.roi_heads.box_predictor(box_features)
    pred_instances, pred_inds = model.roi_heads.box_predictor.inference(predictions, proposals)
    pred_instances = model.roi_heads.forward_with_given_boxes(features, pred_instances)

    # output boxes, masks, scores, etc
    pred_instances = model._postprocess(pred_instances, inputs, images.image_sizes)  # scale box to orig size
    # features of the proposed boxes
    feats = box_features[pred_inds]
Threepence answered 4/7, 2020 at 0:15 Comment(10)
I had one more question, could you point me in the right direction on where I could learn how to visualize these features?Ardeb
if you want to visualise the features of the detected boxes, use one of the dimensional reduction methods such as PCA, or better T-SNE (see here). You should expect that the box features of the same semantic class are close to each other. If you simply want to visualise the box coordinates, use the built-in Visualizer class of Detectron2, see thisThreepence
Thank you for that! Sorry if my question was not clear, I am interested in visualizing the features similar to how we can visualize the feature map from a CNN as a heatmap overlay over the original image. Or I would be satisfied with just visualizing these high level features like deep dream. Could we just take the feature maps from the pooler and overlay this on the image?Ardeb
Unfortunately I don't know a direct way to do that. Here are some alternative directions: (1) use the detected masks and box scores for heatmap area and strength respectively; (2) use an activation visualisation method like GradCam to visualise the most activated area for a given objective. Note in case (2), you need to define your own objective function for GradCam in detectron2, coz GradCam was originally designed for single object classification.Threepence
Thank you so much for your help! I actually figured it out, it isnt too bad, I did technique 1. I even developed and an attention based multimodal model for classification using this and designed a visualization toll that maps attention between text and the bounding boxes! Thank you again, this went directly into my dissertation!Ardeb
@TuBui The features of the 1000 proposal boxes are of the shape (1000,1024) wherein each feature is a tensor of shape 1024. As per 1) use the detected masks and box scores for heatmap area and strength respectively; suggested by you in the above chat, Can you please elaborate on what these features signify and how can we plot on an feature map ?Deviation
For visualisation purpose you don't need the features. Just use the scores to control the transparency of the mask. The more transparent the less confident of the prediction. This is just my suggestion for Kevin request. I am personally happy with the default visualisation settings where the confidence score is shown as percentage on top of each detected box.Threepence
@TuBui Thanks for the explanation but can you please explain in clear terms(as I have not been able to get the clarification anywhere) what does this array of 1024 float values signify clearly.Apologies as I am not able to interpret it clearly.Deviation
@Deviation The above code returns 2 things: pred_instances (which is what Detectron returns by default) and feats (the 1024 float values representing the semantic visual features of the detected objects, as requested by the OP).Threepence
@TuBui Could you please answer this question? stackoverflow.com/q/73829914/5254777Pension

© 2022 - 2024 — McMap. All rights reserved.