Which decision_function_shape for sklearn.svm.SVC when using OneVsRestClassifier?
Asked Answered
H

2

6

I am doing multi-label classification where I am trying to predict correct tags to questions:

(X = questions, y = list of tags for each question from X).

I am wondering, which decision_function_shape for sklearn.svm.SVC should be be used with OneVsRestClassifier?

From docs we can read that decision_function_shape can have two values 'ovo' and 'ovr':

decision_function_shape : ‘ovo’, ‘ovr’ or None, default=None

Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). The default of None will currently behave as ‘ovo’ for backward compatibility and raise a deprecation warning, but will change ‘ovr’ in 0.19.

But I still don't understand what is the difference between:

# First decision_function_shape set to 'ovo'
estim = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape ='ovo'))

# Second decision_function_shape set to 'ovr'
estim = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape ='ovr'))

Which decision_function_shape should be used for multi-label classification problem?

EDIT: Question asking a similar thing with no answer.

Haldas answered 19/4, 2017 at 20:26 Comment(1)
You could try both and check which one gives better results for your specific data.Ous
D
4

I think the question of which should be used is best left up to a situational. That could easily be a part of your GridSearch. But just intuitively I would feel that as far as differences go you are going to be doing the same thing. Here is my reasoning:

OneVsRestClassifier is designed to model each class against all of the other classes independently, and create a classifier for each situation. The way I understand this process is that OneVsRestClassifier grabs a class, and creates a binary label for whether a point is or isn't that class. Then this labelling gets fed into whatever estimator you have chosen to use. I believe the confusion comes in in that SVC also allows you to make this same choice, but in effect with this implementation the choice will not matter because you will always only be feeding two classes into the SVC.

And here is an example:

from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC

data = load_iris()

X, y = data.data, data.target
estim1 = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape='ovo'))
estim1.fit(X,y)

estim2 = OneVsRestClassifier(SVC(kernel='linear', decision_function_shape='ovr'))
estim2.fit(X,y)

print(estim1.coef_ == estim2.coef_)
array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)

So you can see the coefficients are all equal for all three estimators built by the two models. Granted this dataset only has 150 samples and 3 classes so it is possible these results could be different for a more complex dataset, but it's a simple proof of concept.

Dis answered 19/4, 2017 at 22:2 Comment(1)
To add here,SVC will internally always use the OvO, the decision_function_shape is for compatibility only, its not used in fitting of data. So the coef_ will always be equal. I have answered this in the linked questionOus
W
3

The shape of the decision functions are different because ovo trains a classifier for each 2-pair class combination whereas ovr trains one classifier for each class fitted against all other classes.

The best example I could find can be found here on http://scikit-learn.org:

SVC and NuSVC implement the “one-against-one” approach (Knerr et al., 1990) for multi- class classification. If n_class is the number of classes, then n_class * (n_class - 1) / 2 classifiers are constructed and each one trains data from two classes. To provide a consistent interface with other classifiers, the decision_function_shape option allows to aggregate the results of the “one-against-one” classifiers to a decision function of shape (n_samples, n_classes)

>>> X = [[0], [1], [2], [3]]
>>> Y = [0, 1, 2, 3]
>>> clf = svm.SVC(decision_function_shape='ovo')
>>> clf.fit(X, Y) 
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovo', degree=3, gamma='auto', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)
>>> dec = clf.decision_function([[1]])
>>> dec.shape[1] # 4 classes: 4*3/2 = 6
6
>>> clf.decision_function_shape = "ovr"
>>> dec = clf.decision_function([[1]])
>>> dec.shape[1] # 4 classes
4

What does this mean in simple terms?

To understand what n_class * (n_class - 1) / 2 means, generate two-class combinations using itertools.combinations.

def ovo_classifiers(classes):
    import itertools
    n_class = len(classes)
    n = n_class * (n_class - 1) / 2
    combos = itertools.combinations(classes, 2)
    return (n, list(combos))

>>> ovo_classifiers(['a', 'b', 'c'])
(3.0, [('a', 'b'), ('a', 'c'), ('b', 'c')])
>>> ovo_classifiers(['a', 'b', 'c', 'd'])
(6.0, [('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd'), ('c', 'd')])

Which estimator should be used for multi-label classification?

In your situation, you have a question with multiple tags (like here on StackOverflow). If you know your tags (classes) in-advance, I might suggest OneVsRestClassifier(LinearSVC()) but you could try DecisionTreeClassifier or RandomForestClassifier (I think):

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.svm import SVC, LinearSVC
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier

df = pd.DataFrame({
  'Tags': [['python', 'pandas'], ['c#', '.net'], ['ruby'],
           ['python'], ['c#'], ['sklearn', 'python']],
  'Questions': ['This is a post about python and pandas is great.',
           'This is a c# post and i hate .net',
           'What is ruby on rails?', 'who else loves python',
           'where to learn c#', 'sklearn is a python package for machine learning']},
                  columns=['Questions', 'Tags'])

X = df['Questions']
mlb = MultiLabelBinarizer()
y = mlb.fit_transform(df['Tags'].values)

pipeline = Pipeline([
  ('vect', CountVectorizer(token_pattern='|'.join(mlb.classes_))),
  ('linear_svc', OneVsRestClassifier(LinearSVC()))
  ])
pipeline.fit(X, y)

final = pd.DataFrame(pipeline.predict(X), index=X, columns=mlb.classes_)

def predict(text):
  return pd.DataFrame(pipeline.predict(text), index=text, columns=mlb.classes_)

test = ['is python better than c#', 'should i learn c#',
        'should i learn sklearn or tensorflow',
        'ruby or c# i am a dinosaur',
        'is .net still relevant']
print(predict(test))

Output:

                                      .net  c#  pandas  python  ruby  sklearn
is python better than c#                 0   1       0       1     0        0
should i learn c#                        0   1       0       0     0        0
should i learn sklearn or tensorflow     0   0       0       0     0        1
ruby or c# i am a dinosaur               0   1       0       0     1        0
is .net still relevant                   1   0       0       0     0        0
Wariness answered 20/4, 2017 at 7:0 Comment(1)
Great answer. Could you please explain why would you use LinearSVC() instead of SVC(kernel='linear') for this particular problem? Aren't they basically the same just with some implementation differences according to docs? (I know about only one advantage of LinearSVC() - it scales better for bigger datasets)Haldas

© 2022 - 2024 — McMap. All rights reserved.