LinearSVC() differs from SVC(kernel='linear')
Asked Answered
A

1

4

When data is offset (not centered in zero), LinearSVC() and SVC(kernel='linear') are giving awfully different results. (EDIT: the problem might be it does not handle non-normalized data.)

import matplotlib.pyplot as plot
plot.ioff()
import numpy as np
from sklearn.datasets.samples_generator import make_blobs
from sklearn.svm import LinearSVC, SVC


def plot_hyperplane(m, X):
    w = m.coef_[0]
    a = -w[0] / w[1]
    xx = np.linspace(np.min(X[:, 0]), np.max(X[:, 0]))
    yy = a*xx - (m.intercept_[0]) / w[1]
    plot.plot(xx, yy, 'k-')

X, y = make_blobs(n_samples=100, centers=2, n_features=2,
                  center_box=(0, 1))
X[y == 0] = X[y == 0] + 100
X[y == 1] = X[y == 1] + 110

for i, m in enumerate((LinearSVC(), SVC(kernel='linear'))):
    m.fit(X, y)
    plot.subplot(1, 2, i+1)
    plot_hyperplane(m, X)

    plot.plot(X[y == 0, 0], X[y == 0, 1], 'r.')
    plot.plot(X[y == 1, 0], X[y == 1, 1], 'b.')

    xv, yv = np.meshgrid(np.linspace(98, 114, 10), np.linspace(98, 114, 10))
    _X = np.c_[xv.reshape((xv.size, 1)), yv.reshape((yv.size, 1))]
    _y = m.predict(_X)

    plot.plot(_X[_y == 0, 0], _X[_y == 0, 1], 'r.', alpha=0.4)
    plot.plot(_X[_y == 1, 0], _X[_y == 1, 1], 'b.', alpha=0.4)

plot.show()

This is the result I get:

bug

(left=LinearSVC(), right=SVC(kernel='linear'))

sklearn.__version__ = 0.17. But I also tested in Ubuntu 14.04, which comes with 0.15.

I thought about reporting the bug, but it seems too evident to be a bug. What am I missing?

Aeschines answered 15/1, 2016 at 13:6 Comment(0)
C
1

Reading the documentation, they are using different underlying implementations. LinearSVC is using liblinear where SVC is using libsvm.

Looking closely at the coefficients and intercept, it seems LinearSVC applies regularization to the intercept where SVC does not.

By adding intercept_scaling, I was able to obtain the same results to both.

LinearSVC(loss='hinge', intercept_scaling=1000)

Comparsion after intercept scaling

Cristionna answered 15/1, 2016 at 16:55 Comment(4)
Looking at this close, it seems there is an optimization problem with the scale of the variables. Going to extend my answer later.Cristionna
Thank you. So, if I do not want to normalize my datasets and I have no time to go one by one, I should stick with SVC(kernel='linear'), right?Aeschines
SVC seems much less finicky :-). It is generally a good idea with any gradient descent optimizer to use feature scaling and mean centering. Any reason you are avoiding it? It can be easily implemented in scikit-learn with a Pipeline and StandardScaler. It only becomes annoying if you are trying to interpret the coefficients yourself.Cristionna
It is just that we are using data from UCI, and we wanted to compare a method of ours with SVC using a linear kernel. But yeah, we are going to standardized the data for now, and maybe consider using SVC for a final run. LinearSVC is naturally much faster.Aeschines

© 2022 - 2024 — McMap. All rights reserved.