SVC vs LinearSVC in scikit learn: difference of loss function
Asked Answered
P

1

7

According to this post, SVC and LinearSVC in scikit learn are very different. But when reading the official scikit learn documentation, it is not that clear.

Especially for the loss functions, it seems that there is an equivalence: enter image description here

And this post says that le loss functions are different:

  • SVC : 1/2||w||^2 + C SUM xi_i
  • LinearSVC: 1/2||[w b]||^2 + C SUM xi_i

It seems that in the case of LinearSVC, the intercept is regularized, but the official documentation says otherwise.

Does anyone have more information?

Postconsonantal answered 8/10, 2020 at 7:39 Comment(0)
P
4

SVC is a wrapper of LIBSVM library, while LinearSVC is a wrapper of LIBLINEAR

LinearSVC is generally faster then SVC and can work with much larger datasets, but it can only use linear kernel, hence its name. So the difference lies not in the formulation but in the implementation approach.

Quoting LIBLINEAR FAQ:

When to use LIBLINEAR but not LIBSVM

There are some large data for which with/without nonlinear mappings gives similar performances. 
Without using kernels, one can quickly train a much larger set via a linear classifier. 
Document classification is one such application. 
In the following example (20,242 instances and 47,236 features; available on LIBSVM data sets), 
the cross-validation time is significantly reduced by using LIBLINEAR:

% time libsvm-2.85/svm-train -c 4 -t 0 -e 0.1 -m 800 -v 5 rcv1_train.binary
Cross Validation Accuracy = 96.8136%
345.569s

% time liblinear-1.21/train -c 4 -e 0.1 -v 5 rcv1_train.binary
Cross Validation Accuracy = 97.0161%
2.944s

Warning:While LIBLINEAR's default solver is very fast for document classification, it may be slow in other situations. See Appendix C of our SVM guide about using other solvers in LIBLINEAR.
Warning:If you are a beginner and your data sets are not large, you should consider LIBSVM first.
Putput answered 15/10, 2020 at 12:16 Comment(7)
The difference is not only the speed, they are different. I made a simple example here. And you can also read thisPostconsonantal
My question is about the loss function of the two classifiers. Thank youPostconsonantal
You can find more implementation details in the Appendices of the origiinal LIBLINEAR paperPutput
The answer in the post is correct. LIBLINEAR does includes bias term in optimization, while LIBSVM does not.Putput
SVC defaults to L1 loss and L2 penalty. This is why you can create conditions when the results of both are almost equal, if you set for LinearSVM loss="hinge" and intercept_scaling large enough. Bias term is included in LIBLINEAR as weight vector is implicitly extended as w=[w;b]. If you center your data before optimizing, it should effectively set bias to zero.Putput
So, there is an error in the scikit learn documentation? For LinearSVC, the math formula should include a penalty for the bias b, right?Postconsonantal
It surely is confusing and incomplete. There is an issue discussing setting warning that intercept is regularized in code (and eventually not doing it), but no mentions whatsoever in documentation.Putput

© 2022 - 2024 — McMap. All rights reserved.