Support Vector Machine vs K Nearest Neighbours

Asked 17/10, 2013 at 8:43 Answered 17/10, 2013 at 13:15

I have a data set to classify.By using KNN algo i am getting an accuracy of 90% but whereas by using SVM i just able to get over 70%. Is SVM not better than KNN. I know this might be stupid to ask but, what are the parameters for SVM which will give nearly approximate results as KNN algo. I am using libsvm package on matlab R2008

Split answered 17/10, 2013 at 8:43 Comment(5)

This question appears to be off-topic because it is about machine learning and would be more suitable for stats.stackexchange.com – Bettyannbettye 17/10, 2013 at 8:47

so is machine learning not a part of coding? this is an open platform and anyone is free to ask any question as long as it is related to coding and requires people to brainstorm. So if you don't find it useful you can stay away from this discussion amd let others participate. – Split 17/10, 2013 at 9:38

I do not claim it does not belong here, I simply think you'll find more informed audience and better answers at a more dedicated forum such as stats.stackexchange.com – Bettyannbettye 17/10, 2013 at 10:23

Not an expert, but you will rarely find two completely different methods where one is always 'better' then another regardless of the underlying data structure. – Deannadeanne 17/10, 2013 at 12:34

Did you tune hyperparameters when using SVM? If not, that's why its performance sucks. – Agapanthus 10/1, 2014 at 11:17

kNN and SVM represent different approaches to learning. Each approach implies different model for the underlying data.

SVM assumes there exist a hyper-plane seperating the data points (quite a restrictive assumption), while kNN attempts to approximate the underlying distribution of the data in a non-parametric fashion (crude approximation of parsen-window estimator).

You'll have to look at the specifics of your scenario to make a better decision as to what algorithm and configuration are best used.

Bettyannbettye answered 17/10, 2013 at 8:56 Comment(3)

"SVM assumes there exist a hyper-plane seperating the data points (quite a restrictive assumption)" It is not restrictive at all actually, SVM with RBF kernel can scatter any dataset with any combination of labels. – Outdistance 17/10, 2013 at 21:22

@Outdistance indeed kernel SVM are a very powerful tools – Bettyannbettye 18/10, 2013 at 9:17

@ValentinHeinitz you don't really expect to get an accurate explanation in 10 lines? – Bettyannbettye 10/1, 2014 at 13:45

kNN basically says "if you're close to coordinate x, then the classification will be similar to observed outcomes at x." In SVM, a close analog would be using a high-dimensional kernel with a "small" bandwidth parameter, since this will cause SVM to overfit more. That is, SVM will be closer to "if you're close to coordinate x, then the classification will be similar to those observed at x."

I recommend that you start with a Gaussian kernel and check the results for different parameters. From my own experience (which is, of course, focused on certain types of datasets, so your mileage may vary), tuned SVM outperforms tuned kNN.

Questions for you:

1) How are you selecting k in kNN?

2) What parameters have you tried for SVM?

3) Are you measuring accuracy in-sample or out-of-sample?

Swamper answered 17/10, 2013 at 13:15 Comment(0)

It really depends on the dataset you are using. If you have something like the first line of this image ( http://scikit-learn.org/stable/_images/plot_classifier_comparison_1.png ) kNN will work really well and Linear SVM really badly.

If you want SVM to perform better you can use a Kernel based SVM like the one in the picture (it uses a rbf kernel).

If you are using scikit-learn for python you can play a bit with code here to see how to use the Kernel SVM http://scikit-learn.org/stable/modules/svm.html

Dalessandro answered 17/10, 2013 at 9:1 Comment(0)

Recommended topics

Hot tags