How to train isolationForest model so as to give the minimum number of false positives?

Currently, scikit-learn v0.20.3 has isolation forests implemented. IForests are fairly good with handling high dimensional, multivariate data:

"the data is recursively partitioned with axis-parallel cuts at randomly chosen partition points in randomly selected attributes, so as to isolate the instances into nodes with fewer and fewer instances until the points are isolated into singleton nodes containing one instance." -- Charu C. Aggarwal (in Chapter 5 of Outlier Analysis)

I can't say for a fact that it gives the minimum false positives because it would really depend on many factors including your training data. As far as I can tell, it does a good job identifying anomalies and/or outliers (even with discrete time series).

You can set the contamination parameter to whatever percent your heart desires as long as it's a float in (0., 0.5).

"The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function."

The default is 0.1 (or 10%), so you could set contamination=0.04 (4%).

from sklearn.ensemble import IsolationForest

clf = IsolationForest(contamination=0.04)

Recommended topics

Hot tags