Validation loss for pytorch Faster-RCNN
Asked Answered
H

4

13

I’m currently doing object detection on a custom dataset using transfer learning from a pytorch pretrained Faster-RCNN model (like in torchvision tutorial). I would like to compute validation loss dict (as in train mode) at the end of each epoch. I can just run model in train mode for validation like this:

model.train()
for images, targets in data_loader_val:
    images = [image.to(device) for image in images]
    targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

    with torch.no_grad():
         val_loss_dict = model(images, targets)
         print(val_loss_dict)

but I don't think, that it's "correct" way to validate (cause some special layers like dropout and batch norm works different in eval/train mode). And in eval mode model returns predicted bboxes (as expected). Can I use some build-in function for this?

Thanks.

Horseshoes answered 21/2, 2020 at 13:10 Comment(3)
I'm sorry that I don't seem to quite understand the question, but what speaks against model.eval()?Acklin
@Horseshoes Calling model.eval() disables dropout and changes batch norm to use historical statistics, call it before validation. Similarly model.train() should be called before training. By default modules are in train mode.Eolith
This is a valid issue. While both losses and outputs are always calculated, currently torchvision returns losses onyl in training mode, see this line: github.com/pytorch/vision/blob/…Sayre
S
6

There was some discussion about this issue here. The conclusion there is that it is absolutely valid to calculate validation loss in train mode. The numerical value of the val loss in itself is not meaningful, only the trend is important to prevent overfitting. Therefore while train mode does alter the numerical value of the loss, it's still valid to use.


There is however another issue with efficiency here, in case you also need the model outputs in the validation process (for calculating IoU, accuracy, etc. as is often the case). Right now RCNN in torchvision gives you either losses or outputs, depending on training/eval mode.

UPDATE: I realized this fix is not working unfortunately. All submodules would have to be patched to calculate both losses and outputs. Too bad.

My dirty solution was patching the GeneralizedRCNN class from which FasterRCNN inherits. The problem is in this line, in eager_outputs(). The workaround:

    return losses, detections

model = fasterrcnn_resnet50_fpn() model.eager_outputs =
eager_outputs_patch

Now you can get both outputs after a single inference run: model.train() with torch.no_grad(): loss_dict, outputs = model(images, targets). # yaay, now we have both! Note that you still need to put your model to train mode in order to have the losses too. In eval mode GeneralizedRCNN's submodules (rpn, roi_heads) don't calculate any loss, and loss_dict is empty.

Sayre answered 17/12, 2020 at 20:14 Comment(0)
I
3

I have solved this problem by editing Generalized RCNN, RPN, roi_heads. Just add an if-statement to handle when targets are passed to still calculate loss even if not in training mode. For example in RPN:

losses = {}
    if self.training:
        assert targets is not None
        labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
        regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
        loss_objectness, loss_rpn_box_reg = self.compute_loss(
            objectness, pred_bbox_deltas, labels, regression_targets)
        losses = {
            "loss_objectness": loss_objectness,
            "loss_rpn_box_reg": loss_rpn_box_reg,
        }
    else:
        if targets is not None:
            labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
            regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
            loss_objectness, loss_rpn_box_reg = self.compute_loss(
                objectness, pred_bbox_deltas, labels, regression_targets)
            losses = {
                "loss_objectness": loss_objectness,
                "loss_rpn_box_reg": loss_rpn_box_reg,
            }
Incipit answered 11/2, 2021 at 14:19 Comment(0)
F
2

To complete @mkisantal and @Colin Axel's answers, here is the complete list of modifications you need to do in Pytorch's Faster-RCNN code to get the following behaviors :

  • in training, ie when network.train() and targets are provided, produce losses and output,
  • in validation, ie when network.eval() and targets are provided, produce losses and output,
  • in inference, ie when network.eval() and no targets are provided, produce output.

Tested today, with torchvision 0.12.

In generalized_rcnn.py (all files are in torchvision/models/detection/) :
Replace

if torch.jit.is_scripting():
        if not self._has_warned:
            warnings.warn("RCNN always returns a (Losses, Detections) tuple in scripting")
            self._has_warned = True
        return losses, detections
    else:
        return self.eager_outputs(losses, detections)

by

return losses, detections

In rpn.py :
Replace

if self.training:
        assert targets is not None
        labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
        regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
        loss_objectness, loss_rpn_box_reg = self.compute_loss(
            objectness, pred_bbox_deltas, labels, regression_targets
        )
        losses = {
            "loss_objectness": loss_objectness,
            "loss_rpn_box_reg": loss_rpn_box_reg,
        }

by

if targets is not None:
    labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
    regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
    loss_objectness, loss_rpn_box_reg = self.compute_loss(
    objectness, pred_bbox_deltas, labels, regression_targets)
    losses = {
      "loss_objectness": loss_objectness,
      "loss_rpn_box_reg": loss_rpn_box_reg,
     }

In roi_heads.py :
Replace

 if self.training:
    proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
else:
    labels = None
    regression_targets = None
    matched_idxs = None

by

if targets is not None:
    proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)
else:
    labels = None
    regression_targets = None
    matched_idxs = None

and replace

if self.training:
        assert labels is not None and regression_targets is not None
        loss_classifier, loss_box_reg = fastrcnn_loss(class_logits, box_regression, labels, regression_targets)
        losses = {"loss_classifier": loss_classifier, "loss_box_reg": loss_box_reg}
else:
        boxes, scores, labels = self.postprocess_detections(class_logits, box_regression, proposals, image_shapes)
        num_images = len(boxes)
        for i in range(num_images):
            result.append(
                {
                    "boxes": boxes[i],
                    "labels": labels[i],
                    "scores": scores[i],
                }
            )

by

if labels is not None and regression_targets is not None:
    loss_classifier, loss_box_reg = fastrcnn_loss(class_logits, box_regression, labels, regression_targets)
    losses = {"loss_classifier": loss_classifier, "loss_box_reg": loss_box_reg}
boxes, scores, labels = self.postprocess_detections(class_logits, box_regression, proposals, image_shapes)
num_images = len(boxes)
for i in range(num_images):
result.append(
           {
             "boxes": boxes[i],
             "labels": labels[i],
             "scores": scores[i],
            }
         )
Fascism answered 20/5, 2022 at 16:10 Comment(0)
B
0

I just recognized that with the change:

if targets is not None:
proposals, matched_idxs, labels, regression_targets = self.select_training_samples(proposals, targets)

The result change (I get a much higher mAP with it), so I think thats not a valid change.

Bank answered 7/11, 2022 at 23:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.