The best way to calculate the best threshold with P. Viola, M. Jones Framework
C

1

10

I'm trying to implement P. Viola and M. Jones detection framework in C++ (at the beginning, simply sequence classifier - not cascaded version). I think I have designed all required class and modules (e.g Integral images, Haar features), despite one - the most important: the AdaBoost core algorithm.

I have read the P. Viola and M. Jones original paper and many other publications. Unfortunately I still don't understand how I should find the best threshold for the one weak classifier? I have found only small references to "weighted median" and "gaussian distribution" algorithms and many pieces of mathematics formulas...

I have tried to use OpenCV Train Cascade module sources as a template, but it is so comprehensive that doing a reverse engineering of code is very time-consuming. I also coded my own simple code to understand the idea of Adaptive Boosting.

The question is: could you explain me the best way to calculate the best threshold for the one weak classifier?

Below I'm presenting the AdaBoost pseudo code, rewritten from sample found in Google, but I'm not convinced if it's correctly approach. Calculating of one weak classifier is very slow (few hours) and I have doubts about method of calculating the best threshold especially.

(1) AdaBoost::FindNewWeakClassifier
(2) AdaBoost::CalculateFeatures
(3) AdaBoost::FindBestThreshold
(4) AdaBoost::FindFeatureError
(5) AdaBoost::NormalizeWeights
(6) AdaBoost::FindLowestError
(7) AdaBoost::ClassifyExamples
(8) AdaBoost::UpdateWeights

DESCRIPTION (1)
-Generates all possible arrangement of features in detection window and put to the vector
DO IN LOOP
    -Runs main calculating function (2)
END

DESCRIPTION(2)
-Normalizes weights (5)
DO FOR EACH HAAR FEATURE
    -Puts sequentially next feature from list on all integral images
    -Finds the best threshold for each feature (3)
    -Finds the error for each the best feature in current iteration (4)
    -Saves errors for each the best feature in current iteration in array
    -Saves threshold for each the best feature in current iteration in array
    -Saves the threshold sign for each the best feature in current iteration in array
END LOOP
-Finds for classifier index with the lowest error selected by above loop (6)
-Gets the value of error from the best feature
-Calculates the value of the best feature in the all integral images (7)
-Updates weights (8)
-Adds new, weak classifier to vector

DESCRIPTION (3)
-Calculates an error for each feature threshold on positives integral images - seperate for "+" and "-" sign (4)
-Returns threshold and sign of the feature with the lowest error

DESCRIPTION(4)
- Returns feature error for all samples, by calculating inequality f(x) * sign < sign * threshold

DESCRIPTION (5)
-Ensures that samples weights are probability distribution

DESCRIPTION (6)
-Finds the classifier with the lowest error

DESCRIPTION (7)
-Calculates a value of the best features at all integral images
-Counts false positives number and false negatives number

DESCRIPTION (8)
-Corrects weights, depending on classification results

Thank you for any help

Charin answered 19/3, 2012 at 20:18 Comment(0)
C
16

In the original viola-Jones paper here, section 3.1 Learning Discussion (para 4, to be precise) you will find out the procedure to find optimal threshold.

I'll sum up the method quickly below.


Optimal threshold for each feature is sample-weight dependent and therefore calculated in very iteration of adaboost. The best weak classifier's threshold is saved as mentioned in the pseudo code.

In every round, for each weak classifier, you have to arrange the N training samples according to the feature value. Putting a threshold will separate this sequence in 2 parts. Both parts will have either positive or negative samples in majority along with a few samples of other type.

  • T+ : total sum of positive sample weights
  • T- : total sum of negative sample weights
  • S+ : sum of positive sample weights below the threshold
  • S- : sum of negative sample weights below the threshold

Error for this particular threshold is -

e = MIN((S+) + (T-) - (S-), (S-) + (T+) - (S+))

Why the minimum? here's an example:
If the samples and threshold is like this -

+ + + + + - - | + + - - - - -

In the first round, if all weights are equal(=w), taking the minimum will give you the error of 4*w, instead of 10*w.

You calculate this error for all N possible ways of separating the samples.
The minimum error will give you the range of threshold values. The actual threshold is probably the average of the adjacent feature values (I'm not sure though, do some research on this).
This was the second step in your DO FOR EACH HAAR FEATURE loop.
The cascades given along with OpenCV were created by Rainer Lienhart and I don't know what method he used. You could closely follow the OpenCV source codes to get any further improvements on this procedure.

Cybill answered 21/3, 2012 at 7:37 Comment(7)
Thank you. I read another version of original paper, when "Learning Discussion" chapter is incomplete. Following your advice and the original text, I have implemented completely new version of my algorithm. It can select the one weak classifier in 16 minutes instead of 10 hours, as my previous (founded on the web) algorithm! I'm trying to test it, but this is not a trivial task, because it is very complex. Currently I have two problems: algorithm gives me an error equal zero with very small data set (do you think is it correct?), I don't have a good idea to test it fast.Charin
OK. Please verify my thinking. The algorithm is numerically unstable for small feature set. Why? Because, then simply it's possible to find the one classifier with 100% correctness. This fact imply that error of classification is zero, so beta factor is also zero and it's impossible to calculate alpha consequently. So the best and only one way to avoid this situation is to feed an algorithm with thousands of training samples. E.g the one feature arranges values in such order - - - - | + + + + + +. (T+)=6w, (T-)=4w, (S+)=0w, (S-)= 4w.(S+)+((T-)-(S-))=0w,(S-)+((T+)-(S+))=10w. Min = 0.Am I right?Charin
you are right! if a feature can correctly separate faces from non-faces, then ideally that is your final classifier!! But it is far from what we want. That is why Viola-Jones used 20000 carefully selected training examples and trained them on all possible features (160000 total). Practically this is huge, it could take months of fine tuning. One thing that I don't understand is that do you want to write code for the actual face detection or both training and detection. You can do training separately in OpenCV, and run cascade through your own detection code, which will be simpler.Cybill
It is simpler to use OpenCV framework of course, but I need to write my own software, because it's a part of my Msc Thesis. Viola and Jones used 200 weak classifiers is sequence version. If I need only about 15 minutes to train one weak classifier on 20000 19x19 px samples, so I think that I need only about two days (50 hours) to train sequence AdaBoost. In the future I plan to design a cascade version and to use genetic algorithm for fast feature selection. I hope that I can do it. BTW. What faces database for training could you recommend for training (except FERET)?Charin
I haven't tried it myself. But I remember reading that MIT-CMU face database is good for frontal faces and sheffield database is good for profile faces. You can follow Naotoshi Seo's blog for training procedure. Also remember that the training dataset cannot be your testing dataset, so keep a well recognized dataset for testing purposes. Good luck with your project. :)Cybill
If for each feature you have to calculate all the values for each sample and sort them, how do you have enough memory for all that? If you have 160,000 features and lets say only 3000 samples thats over almost half a billion values, and I run out of memory before I can continue further in the adaboostMcnamee
@Mcnamee 1.78GB will run you out of memory?Fromm

© 2022 - 2024 — McMap. All rights reserved.