Anomaly Detection vs Supervised Learning

Asked 25/4, 2014 at 16:42 Answered 6/4, 2016 at 0:45

I have very small data that belongs to positive class and a large set of data from negative class. According to prof. Andrew Ng (anomaly detection vs supervised learning), I should use Anomaly detection instead of Supervised learning because of highly skewed data.

Please correct me if I am wrong but both techniques look same to me i.e. in both (supervised) Anomaly detection, and standard Supervised learning, we train data with both normal and anomalous samples and test on unknown data. Is there any difference?

Should I just perform under-sampling of negative class or over-sampling of positive class to get both type data of same size? Does it affect the overall accuracy?

Novgorod answered 25/4, 2014 at 16:42 Comment(2)

There are several types of supervised learning classifiers. Which one in specific are you referring to? A neural network for example is much different than logistic regression – Burnight 25/4, 2014 at 17:48

I am not much specific but what if I use neural network. – Novgorod 25/4, 2014 at 18:6

Actually in supervised learning, you have the data set labelled (e.g good, bad) and you pass the labelled values as you train the model so that it learns parameters that will separate the 'good' from 'bad' results.

In anomaly detection, it is unsupervised as you do not pass any labelled values.. What you do is you train using only the 'non-anomalous' data. You then select epsilon values and evaluate with a numerical value (such as F1 score) so that your model will get a good balance of true positives.

Regarding trying to over/under sample so your data is not skewed, there are 2 things.

Prof Ng mentioned something like if your positive class is only 10 out of 10k or 100k then you need to use anomaly detection since your data is highly skewed.
Supervised learning makes sense if you know typically what 'bad' values are. If you only know what is 'normal'/'good' but your 'bad' value can really be very different every time then this is a good case for anomaly detection.

Jacobsohn answered 6/4, 2016 at 0:45 Comment(0)

In anomaly detection you would determine model parameters from the portion of the data which is well supported (As Andrew explains). Since your negative class has many instances you would use these data for 'learning'. Kernel density estimation or GMMs are examples of approaches that are typically used. A model of 'normalcy' may thus be learnt and thresholding may be used to detect instances which are considered anomalous with respect to your derived model. The difference between this approach and conventional supervised learning lies in the fact that you are using only a portion of the data (the negative class in your case) for training. You would expect your positive instances to be identified as anomalous after training.

As for your second question, under-sampling the negative class will result in a loss of information whilst over-sampling the positive class doesn't add information. I don't think that following that route is desirable.

Cowgirl answered 27/4, 2014 at 21:59 Comment(0)

Recommended topics

Hot tags