How to modify ssd mobilenet config to detect small objects using tensorflow object detection API?
Asked Answered
C

2

5

I am trying to detect small objects from ipcam videostreams using ssd mobilenetv2. The model was trained on the high resolution images of these small objects where the objects are very close to the camera.Images were downloaded from internet. I found that changing the anchorbox scales and modifying feature extractor.py are the proposed solutions to overcome this. Can anyone guide me how to do this?

Chorography answered 5/3, 2020 at 3:25 Comment(1)
I hope someone could help me on this.Chorography
Y
12

mobilenet-ssd - is great for large objects, yet its performance for small objects is pretty poor. It is always better to train with anchors tuned to the objects aspect ratios, and sizes you expect. One more thing to take into account is that the first branch is the one which detects the smallest objects - the resolution of this branch is 1/16 of the input - you should consider adding another branch at the 1/8 feature map - which will help with small objects.

How to change anchors sizes and aspect ratios: Let us take for example the pipeline.config file which is being used for the training configuration - https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config. You will find there the following arguments:

 90     anchor_generator {
 91       ssd_anchor_generator {
 92         num_layers: 6
 93         min_scale: 0.20000000298
 94         max_scale: 0.949999988079
 95         aspect_ratios: 1.0
 96         aspect_ratios: 2.0
 97         aspect_ratios: 0.5
 98         aspect_ratios: 3.0
 99         aspect_ratios: 0.333299994469
100       }
101     }
  • num_layers - number of branches - starts from a branch from 1/16 of the input...
  • min_scale / max_scale - min_scale corresponds to the scale of the anchors in the first branch, max_scale corresponds to the scale of the last branch. While all the branches between gets scale from linear interpolation: min_scale + (max_scale - min_scale)/(num_layers - 1) * (#branch) (same as defined in SSD: Single Shot MultiBox Detector - https://arxiv.org/pdf/1512.02325.pdf)
  • aspect_ratios - list of aspect ratios define the anchors - this way you can decide what AR anchors to add, AR=1.0 means a square anchor, while 2.0 means that the anchor is landscape - while its width is x2 the height, 0.5 means portrait where the height is x2 the width... the code can be find in the following path: https://github.com/tensorflow/models/blob/master/research/object_detection/anchor_generators/grid_anchor_generator.py and https://github.com/tensorflow/models/blob/master/research/object_detection/anchor_generators/multiscale_grid_anchor_generator.py
  • One more thing is that in mobilenet-v1-ssd - the first branch has only 3 anchors, i'm not sure how much mobilenet-v2-ssd has, but you may want to add more anchors. You will need to change it in the code (in multiple_grid_anchor_generator.py) 320 if layer == 0 and reduce_boxes_in_lowest_layer: 321 layer_box_specs = [(0.1, 1.0), (scale, 2.0), (scale, 0.5)] as you seed it is hard coded to be three anchors...

How to start the branches earlier

This also would be needed to be changed inside the code. Each predefined model has its own model file - i.e. ssd_mobilenet_v2: https://github.com/tensorflow/models/blob/master/research/object_detection/models/ssd_mobilenet_v2_feature_extractor.py

lines 111:117

feature_map_layout = {
    'from_layer': ['layer_15/expansion_output', 'layer_19', '', '', '', ''
                  ][:self._num_layers],
    'layer_depth': [-1, -1, 512, 256, 256, 128][:self._num_layers],
    'use_depthwise': self._use_depthwise,
    'use_explicit_padding': self._use_explicit_padding,
}

You can choose what layers to start from by their name.

Now for my 2 cents, I didn't try mobilenet-v2-ssd, mainly used mobilenet-v1-ssd, but from my experience is is not a good model for small objects. I guess it can be optimized a little bit by editing the anchors, but not sure if it will be sufficient for your needs. for one stage ssd like network consider using ssd_mobilenet_v1_fpn_coco - it works on 640x640 input size, and its first branch is starts at 1/8 input size. (cons - bigger model, and higher inference time)

Yurt answered 13/4, 2020 at 12:7 Comment(3)
Thanks a lot for your well explained answer.I was not active for a while on stackoverflow.I had tried changing the anchor size and removing layers after following the answers from other similar posts before,however it didn't help in my case .Anyways I fixed the issue to a great extent by adding more images with small objects and data augmentation methods.Thanks again for the response.Chorography
Hey Tamir, Can you shed some light how to write feature_map_layout, Is there easy way to know name of layers, I have tried tensorboard but its not easy to understand name and order through that.Shin
Hey @prateekkhandelwal, You will have to find out the tensors name from the backbone network, because these feature map names are corresponding to the backbone network.Yurt
N
0

Late to the party, posting for posterity. I had better luck with small objects using the ssd_mobilenet_v2_fpnlite... variants

read about fpn here https://towardsdatascience.com/review-fpn-feature-pyramid-network-object-detection-262fc7482610

Nerveless answered 12/6, 2023 at 18:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.