What is the range of Scikit-Learn's IsolationForest decision_function scores?
Asked Answered
I

1

7

Scikit-Learn's IsolationForest class has a method decision_function that returns the anomaly scores of the input samples. However, the documentation does not state what the possible range of these scores is, and only states that "the lower [the score], the more abnormal."

Edit: after reading jmunsch's comment I looked at the source code again and here is my updated guess: If the exponent in the scores formula is always negative, then scores will always be between 0 and 1, which would mean the returned range is [-0.5, 0.5] since 0.5 - scores is returned by the method. But I'm not certain if the exponent would always be negative.

Intranuclear answered 20/7, 2017 at 19:44 Comment(2)
when in doubt look at the source : github.com/scikit-learn/scikit-learn/blob/ab93d65/sklearn/…Intension
@Intension I forgot to mention in my question that I did look at the source, but it wasn't apparent to me what the range would be. If the exponent is always negative, then scores will always be between 0 and 1, which would mean the range is [-0.5, 0.5]. But I'm not certain if the exponent would always be negative.Intranuclear
T
5

In Scikit-Learn's IsolationForest the decision_function returns values in the range of [-0.5, 0.5] where -.5 is the most anomalous.

Or so I believe and have never seen evidence otherwise. The documentation for Scikit-Learn's IsolationForest references a paper Isolation-based Anomaly Detection by Liu et al. where equation 2 defines the anomaly score. In the paper the anomaly score ranges between 0 and 1, where 1 is most anomalous. In the scores function you reference on line 267 the variable depths.mean(axis=1) corresponds to E(h(x)) and _average_path_length(self.max_samples_)) corresponds to c(psi) in the paper. Thus on line 272 when the function returns 1 minus the score we get the bounds of [-0.5, 0.5].

Edit/Bonus: The predict method of isolation forest effectively is just comparing the decision_function values to a threshold that is stored in model.threshold_. So after calling the model's predict method on some data the anomalous items are the same items that meet the criteria:model.decision_function(data) < model.threshold_.

Tithable answered 16/8, 2018 at 18:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.