Are modern CNN (convolutional neural network) as DetectNet rotate invariant?

Asked 3/12, 2016 at 20:27 Answered 4/5, 2022 at 1:54

Solved machine-learning computer-vision neural-network deep-learning conv-neural-network

As known nVidia DetectNet - CNN (convolutional neural network) for object detection is based on approach from Yolo/DenseBox: https://devblogs.nvidia.com/parallelforall/deep-learning-object-detection-digits/

DetectNet is an extension of the popular GoogLeNet network. The extensions are similar to approaches taken in the Yolo and DenseBox papers.

And as shown here, DetectNet can detects objects (cars) with any rotations: https://devblogs.nvidia.com/parallelforall/detectnet-deep-neural-network-object-detection-digits/

Are modern CNN (convolutional neural network) as DetectNet rotate invariant?

Can I train DetectNet on thousands different images with one the same rotation angle of object, to detect objects on any rotation angles?

And what about rotate invariant of: Yolo, Yolo v2, DenseBox on which based DetectNet?

Subreption answered 3/12, 2016 at 20:27 Comment(0)

In classification problems, CNNs are not rotate invariant. You need to include in your training set images with every possible rotation.

You can train a CNN to classify images into predefined categories (if you want to detect several objects in a image as in your example you need to scan every place of a image with your classifier).

However, this is an object detection problem, not only a classification problem.

In object detection problems, you can use a sliding window approach, but it is extremely inefficient. Instead a simple CNN other architectures are the state of art. For example:

Faster RCNN: https://arxiv.org/pdf/1506.01497.pdf
YOLO NET: https://pjreddie.com/darknet/yolo/
SSD: https://arxiv.org/pdf/1512.02325.pdf

These architectures can detect the object anywhere in the image, but you also must include in the training set samples with different rotations (and the training set must be labelled using bounding boxes, that it is very time consuming).

Procarp answered 3/12, 2016 at 22:37 Comment(6)

Are you talking about sliding window approach? – Hygrostat 17/5, 2017 at 8:57

@mrgloom. It has nothing to do with sliding window. – Rager 1/6, 2017 at 8:13

@Rager scan every place of a image with your classifier ? – Hygrostat 1/6, 2017 at 9:4

A CNN is invariant to small horizontal or vertical movements in your training data mainly because of max pooling. – Rager 2/6, 2017 at 3:59

But here in this answer it is being said that such invariant property can be achieved by using dropout. Although I myself failed to achieve it practically even if I used a significant percentage of dropout. I don't know why, I might be wrong. Here is the answer I am talking about: stats.stackexchange.com/a/239079/245577 – European 23/2, 2020 at 0:43

Exactly which transformations you are considering as "movements" ? – European 23/2, 2020 at 1:3

Adding on to Rob's answer, in general CNN itself is translation invariant, but not rotation and scale. However, it is not compulsory to include all possible rotations into your training data. A max pooling layer would introduce rotation invariant.

This image posted by Franck Dernoncourt here might be what you're looking for.

Secondly, regarding Kershaw's comment on Rob's answer which says:

A CNN is invariant to small horizontal or vertical movements in your training data mainly because of max pooling.

The main reason CNNs are translation invariant is the convolution. The filter would extract the feature regardless of where it is in the image since the filter will be moving across the entire image. It is when the image is rotated or scaled that the filter would fail because of the difference in pixel representation of the features.

Source: Aditya Kumar Praharaj's answer from this link.

Centric answered 26/10, 2017 at 7:43 Comment(0)

The CNN is rotation-invariant providing all convolution kernel with a property K = T{K} (e.g., use of symmetrical kernels) and replace the 1st flatten layer with a merge convolution layer. I called it transformation-identical CNN (TI-CNN), https://arxiv.org/abs/1806.03636 and https://arxiv.org/abs/1807.11156

If you wish to establish a rotation-identical CNN (virtually arbitrary small angle), I would introduce the geared rotation-identical CNN (GRI-CNN) https://arxiv.org/abs/1808.01280

Ambie answered 4/5, 2022 at 1:54 Comment(0)

Detectron2 added Rotated Faster RCNN network recently. To create such model, you should create annotations for vehicles with rotated bounding box, which is:

rbbox = [center_x, center_x, width, height, angle]

For example:

Visit this link for more information.

Issuant answered 26/8, 2021 at 14:46 Comment(0)

Recommended topics

Hot tags