Viola Jones face detection - variations in object/face size
Asked Answered
E

1

6

I'm trying to understand Viola Jones method, and I've mostly got it.

It uses simple Haar like features boosted into strong classifiers and organized into layers /cascade in order to accomplish better performances (not bother with obvious 'non object' regions).

I think I understand integral image and I understand how are computed values for the features.

The only thing I can't figure out is how is algorithm dealing with the face size variations.

As far as I know they use 24x24 subwindow that slides over the image, and within it algorithm goes through classifiers and tries to figure out is there a face/object on it, or not.

And my question is - what if one face is 10x10 size, and other 100x100? What happens then?

And I'm dying to know what are these first two features (in first layer of the cascade), how do they look like (keeping in mind that these two features, acording to Viola&Jones, will almost never miss a face, and will eliminate 60% of the incorrect ones) ? How??

And, how is possible to construct these features to work with these statistics for different face sizes in image?

Am I missing something, or maybe I've figured it all wrong?

If I'm not clear enough, I'll try to explain better my confusion.

Eberhart answered 2/9, 2012 at 17:30 Comment(0)
B
6

Training

The Viola-Jones classifier is trained on 24*24 images. Each of the face images contains a similarly scaled face. This produces a set of feature detectors built out of two, three, or four rectangles optimised for a particular sized face.

Face size

Different face sizes are detected by repeating the classification at different scales. The original paper notes that good results are obtained by trying different scales a factor of 1.25 apart.

Note that the integral image means that it is easy to compute the rectangular features at any scale by simply scaling the coordinates of the corners of the rectangles.

Best features

The original paper contains pictures of the first two features selected in a typical cascade (see page 4).

The first feature detects the wide dark rectangle of the eyes above a wide brighter rectangle of the cheeks.

----------
----------
++++++++++
++++++++++

The second feature detects the bright thin rectangle of the bridge of the nose between the darker rectangles on either side containing the eyes.

---+++---
---+++---
---+++---
Brattice answered 2/9, 2012 at 19:2 Comment(2)
@Peter Thank you for such helpful answer, i have some same type of question that is scaling the image effect the result of detection if we use Viola Jones ? like if i stand 1 foot from webcam and then second time i stand like 5 feet from webcam for detection, so is this effect result using this algorithm ?Thrift
@Thrift Yes, the algorithm should still work the same provided that the face still covers at least 24*24 pixels and provided the implementation of the algorithm tests a wide enough range of scales (often implementations only search for large faces to avoid false positives)Brattice

© 2022 - 2024 — McMap. All rights reserved.