Making SVM run faster in python

Asked 28/7, 2015 at 15:53 Answered 2/1, 2021 at 20:15

Using the code below for svm in python:

from sklearn import datasets
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, class_weight='auto'))
clf.fit(X, y)
proba = clf.predict_proba(X)

But it is taking a huge amount of time.

Actual Data Dimensions:

train-set (1422392,29)
test-set (233081,29)

How can I speed it up(parallel or some other way)? Please help. I have already tried PCA and downsampling.

I have 6 classes. Edit: Found http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html but I wish for probability estimates and it seems not to so for svm.

Edit:

from sklearn import datasets
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC,LinearSVC
from sklearn.linear_model import SGDClassifier
import joblib
import numpy as np
from sklearn import grid_search
import multiprocessing
import numpy as np
import math

def new_func(a):                              #converts array(x) elements to (1/(1 + e(-x)))
    a=1/(1 + math.exp(-a))
    return a

if __name__ == '__main__':
    iris = datasets.load_iris()
    cores=multiprocessing.cpu_count()-2
    X, y = iris.data, iris.target                       #loading dataset

    C_range = 10.0 ** np.arange(-4, 4);                  #c value range 
    param_grid = dict(estimator__C=C_range.tolist())              

    svr = OneVsRestClassifier(LinearSVC(class_weight='auto'),n_jobs=cores) ################LinearSVC Code faster        
    #svr = OneVsRestClassifier(SVC(kernel='linear', probability=True,  ##################SVC code slow
    #   class_weight='auto'),n_jobs=cores)

    clf = grid_search.GridSearchCV(svr, param_grid,n_jobs=cores,verbose=2)  #grid search
    clf.fit(X, y)                                                   #training svm model                                     

    decisions=clf.decision_function(X)                             #outputs decision functions
    #prob=clf.predict_proba(X)                                     #only for SVC outputs probablilites
    print decisions[:5,:]
    vecfunc = np.vectorize(new_func)
    prob=vecfunc(decisions)                                        #converts deicision to (1/(1 + e(-x)))
    print prob[:5,:]

Edit 2: The answer by user3914041 yields very poor probability estimates.

Sieracki answered 28/7, 2015 at 15:53 Comment(20)

Quantify "huge amount of time." What have you used to profile your code? – Hinterland 28/7, 2015 at 16:11

@tristan Thanks for comment. I am stating roughly by random runs of the code. I am roughly measuring it by the output checks in the code, which is bad way to do. Does that answer your question? – Sieracki 28/7, 2015 at 16:16

Do you need all 1.4 million training examples? According to the docs The fit time complexity is more than quadratic in the number of training examples. Additionally, do you need the probability estimates? That requires an additional run of cross-validation to generate. – Magistral 28/7, 2015 at 16:19

@NBartley. Thanks for info! As mentioned, I can downsample but it is not preferrable. Yes, I need probability estimates bounded by some competition format. – Sieracki 28/7, 2015 at 16:21

The OneVsRestClassifier comes with an option for parallelism, but be warned that it may eat up many of your resources, as it will take a significant time to fit each of the models. Try setting the n_jobs parameter according to the docs here. – Magistral 28/7, 2015 at 16:24

Try MKL Optimizations from Continuum, see store.continuum.io/cshop/mkl-optimizations. They offer a 30 day free trial and cost is $99. I am not a sales rep, but I use their Anaconda Python distribution and like it - it was recommended at Spark Summit training. Incidentally Spark supports SVM and running it on even a small Spark cluster would greatly improve performance, see spark.apache.org/docs/1.1.0/…. – Kain 28/7, 2015 at 16:33

@TrisNefzger Spark won't work because it does not support probability estimates for SVM – Devanagari 28/7, 2015 at 16:40

@TrisNefzger Thanks for the useful knowledge! I do have a HPC cluster with me. But if doesn't offer probability estimates then it wouldn't be of much use. – Sieracki 28/7, 2015 at 16:41

I haven't looked much into it, but I think IPython Parallel / Starcluster might be worth checking out as well. Here's a gist with demo code from one of the sklearn contributors' tutorials. But to build off Tris's comment, you're going to want to try to move over to a cluster at some point. And if sklearn doesn't work easily on a cluster, you might want to consider writing your own code on top of these other libraries that gives you the probability estimates you need. – Magistral 28/7, 2015 at 16:48

@NBartley Thanks for the reply again! I tried using OneVsRestClassifier in parallel. I allocated it 14 cores but it seemed to use only 6 of them (which is equal to the number of classes). Any reason for this you know, I am unsure how parallel gradient works. If I cannot run on more than the number of classes, I don't think using cluster would be of help.(Also, I have already have more than enough RAM ~48GB on desktop. So there is no problem of memory.) – Sieracki 28/7, 2015 at 17:29

Yes SVM takes so much time and way slow in CPUs. You will need to whiten the PCA data, to make it faster or try to find a library that runs in GPU. – Deafanddumb 28/7, 2015 at 17:31

@Deafanddumb Thanks for the reply! I do whiten the data. I can't find any such library. Can you mention why using GPU would help. – Sieracki 28/7, 2015 at 17:42

It's not really parallel gradient so much as it's fitting the 6 separate OneVsRest models in parallel, so it makes sense that it won't parallelize more than that. If you intend to stay with Python and sklearn.SVC because of the probability estimates then it seems to me like your best bet might be to downsample, PCA, and use OneVsRest with 6 jobs. – Magistral 28/7, 2015 at 17:47

@NBartley Thanks for the info! I myself would prefer Matlab over python but since most of other code in python. Also, the time doesn't permit the change. – Sieracki 28/7, 2015 at 19:41

Running on GPU is 20X, plus if you run native c/c++ code, it adds to the speed. Python always slow (atleast to me!). Take a look here: devtalk.nvidia.com/default/topic/485456/… – Deafanddumb 28/7, 2015 at 23:48

@NBartley Downsampling and PCA don't give good results. If you come to know any other possiblity please let me know. Probably using LinearSVC to render probability estimates. – Sieracki 29/7, 2015 at 18:22

If that isn't working for your purposes, then I agree. LinearSVC with calibrated probability estimates is then another good option. I would imagine that you can also try regularized Logistic Regression again with appropriate parameters, even if it has yielded lower accuracy as you mention below. It's very difficult to gauge what will work best for you without knowing anything else about the data. :/ – Magistral 29/7, 2015 at 19:38

@NBartley Hi, thanks for info! I tried some code for LinearSVC with calibrated probability estimates, please check(Edit in Question). Using maximum likelohood should probably be better from what I could find. 1/1+exp(Ax+B), where A and B are parameters learned by ML estimate. Can you help how to implement it. I can't seem to find a starting point. – Sieracki 30/7, 2015 at 17:26

You should check this new library to speed up the training process > intel.github.io/scikit-learn-intelex – Chiastic 8/1, 2022 at 12:52

The parallel didn't happen in SVC but in multi-classifier part, I run it, and seems that was the case – Intrust 7/7, 2023 at 10:20

137

If you want to stick with SVC as much as possible and train on the full dataset, you can use ensembles of SVCs that are trained on subsets of the data to reduce the number of records per classifier (which apparently has quadratic influence on complexity). Scikit supports that with the BaggingClassifier wrapper. That should give you similar (if not better) accuracy compared to a single classifier, with much less training time. The training of the individual classifiers can also be set to run in parallel using the n_jobs parameter.

Alternatively, I would also consider using a Random Forest classifier - it supports multi-class classification natively, it is fast and gives pretty good probability estimates when min_samples_leaf is set appropriately.

I did a quick tests on the iris dataset blown up 100 times with an ensemble of 10 SVCs, each one trained on 10% of the data. It is more than 10 times faster than a single classifier. These are the numbers I got on my laptop:

Single SVC: 45s

Ensemble SVC: 3s

Random Forest Classifier: 0.5s

See below the code that I used to produce the numbers:

import time
import numpy as np
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn import datasets
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC

iris = datasets.load_iris()
X, y = iris.data, iris.target

X = np.repeat(X, 100, axis=0)
y = np.repeat(y, 100, axis=0)
start = time.time()
clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, class_weight='auto'))
clf.fit(X, y)
end = time.time()
print "Single SVC", end - start, clf.score(X,y)
proba = clf.predict_proba(X)

n_estimators = 10
start = time.time()
clf = OneVsRestClassifier(BaggingClassifier(SVC(kernel='linear', probability=True, class_weight='auto'), max_samples=1.0 / n_estimators, n_estimators=n_estimators))
clf.fit(X, y)
end = time.time()
print "Bagging SVC", end - start, clf.score(X,y)
proba = clf.predict_proba(X)

start = time.time()
clf = RandomForestClassifier(min_samples_leaf=20)
clf.fit(X, y)
end = time.time()
print "Random Forest", end - start, clf.score(X,y)
proba = clf.predict_proba(X)

If you want to make sure that each record is used only once for training in the BaggingClassifier, you can set the bootstrap parameter to False.

Cyrille answered 15/8, 2015 at 14:23 Comment(7)

Thanks for the amazing answer!! I didn't know about these. In addition to speed, accuracy is also my prime concern. Could you give a comparison of that if possible? I am not bound to SVC, please suggest other good approaches also if you want. – Sieracki 17/8, 2015 at 3:56

Also you could check out the sklearn.ensemble.AdaBoostClassifier for use with random forest or decision trees. – Maupin 4/10, 2016 at 16:45

If you want a linear kernel, you can use sklearn.svm.LinearSVC which is basically the same, but implemented with a faster library than the sklearn.svm.SVC. – Describe 19/10, 2017 at 9:25

The RandomForestClassifier works amazingly fast, but from what I understand it doesn't use linear / poly kernels like SVC do it gives lower accuracy. Can I improve accuracy of RandomForestClassifier? – Deryl 27/12, 2017 at 20:31

@Alexander Bauer Sorry, could you explain what the chaining of the classifiers does? OneVsRestClassifier(BaggingClassifier(SVC ... – Carlow 12/5, 2018 at 2:36

This is a great approach!: I got similar results on F1 Score; when ran without BaggingClassifier it took 4d 3h 27min, but ran with BaggingClassifier it took 31min 8s – Centaurus 4/6, 2020 at 0:1

first time training an SVM, i tried increasing n_jobs to more than 1, but i realise it does not run at all. Only when n_jobs = 1 only does it run. Do you know what could be the cause ? – Ghostly 15/11, 2022 at 2:21

SVM classifiers don't scale so easily. From the docs, about the complexity of sklearn.svm.SVC.

The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples.

In scikit-learn you have svm.linearSVC which can scale better. Apparently it could be able to handle your data.

Alternatively you could just go with another classifier. If you want probability estimates I'd suggest logistic regression. Logistic regression also has the advantage of not needing probability calibration to output 'proper' probabilities.

Edit:

I did not know about linearSVC complexity, finally I found information in the user guide:

Also note that for the linear case, the algorithm used in LinearSVC by the liblinear implementation is much more efficient than its libsvm-based SVC counterpart and can scale almost linearly to millions of samples and/or features.

To get probability out of a linearSVC check out this link. It is just a couple links away from the probability calibration guide I linked above and contains a way to estimate probabilities. Namely:

    prob_pos = clf.decision_function(X_test)
    prob_pos = (prob_pos - prob_pos.min()) / (prob_pos.max() - prob_pos.min())

Note the estimates will probably be poor without calibration, as illustrated in the link.

Heretofore answered 28/7, 2015 at 17:13 Comment(14)

Thanks for the reply! About scaling @NBartley has mentioned it previously. I have tried logistic regression, it gives lesser accuracy. – Sieracki 28/7, 2015 at 17:24

Thanks for reply! But linearSVC has no option of outputting the probability estimates. – Sieracki 28/7, 2015 at 17:40

You're right. A possible workaround is to use the decision_function attribute, as it is done with LinearSVC in the link I gave about probability calibration. You'll definitely need to calibrate for the probabilities to make sense though. – Heretofore 28/7, 2015 at 18:5

Can you elucidate more on the calibration part. – Sieracki 28/7, 2015 at 19:21

If you have specific questions feel free to ask but for the concept I won't be able to do a better job than the link I gave in the post. – Heretofore 28/7, 2015 at 20:23

How can convert the decision function(scikit-learn.org/stable/modules/generated/…) to estimates. Simply using [1 / (1 + exp(-x))] doesn't seem good. Maybe using 1/1+exp(Ax+B), where A and B are parameters learned by Max likelihood estimate is better. But I am still very new both to ml and python. Can you provide some starting code for the conversion. – Sieracki 29/7, 2015 at 17:33

@AbhishekBhatia I edited my answer. It takes some time but you'd probably learn a lot (at least I did) by reading the guide on probability calibration. – Heretofore 29/7, 2015 at 17:50

Thanks a great deal! Does the implementation predict_proba() similar with what you have mentioned? – Sieracki 30/7, 2015 at 17:29

This gives very poor results. Thus, I have unaccepted the answer. Please suggest alternative approaches – Sieracki 12/8, 2015 at 6:28

@AbhishekBhatia Very poor as in "compared to SVC with One vs Rest and predict_proba on a small sample, using xxx as a metric/heuristic"? Are you using probability calibration? – Heretofore 12/8, 2015 at 7:6

No I am not using any calibration. Can you elucidate more please. – Sieracki 12/8, 2015 at 7:13

I already told you I could not do better than this link scikit-learn.org/stable/auto_examples/calibration/…. I also mentioned in my answer that the estimates would probably be poor without calibration. If you don't even try to read it I can't help you. – Heretofore 12/8, 2015 at 7:52

@user3914014 Thanks for the response again. I tried using some other approaches for calibration after reading your advise namely CalibratedClassifierCV(no support for multilabel). But I am not able to find a fitting one. Can you please direct me in the right direction by giving some example. – Sieracki 12/8, 2015 at 9:29

@Heretofore thanks for tip, I'm facing an issue with multilabel classification where CalibratedClassifierCV does not accept multilabel (MultiLabelBinarizer) vectors. I'm using OneVsRestClassifier(LinearSVC()). Do you know of a way to calibrate this way? – Piscina 19/3, 2018 at 10:10

You can use the kernel_approximation module to scale up SVMs to a large number of samples like this.

Scarce answered 14/7, 2017 at 19:42 Comment(0)

It was briefly mentioned in the top answer; here is the code: The quickest way to do this is via the n_jobs parameter: replace the line

clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, class_weight='auto'))

with

clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, class_weight='auto'), n_jobs=-1)

This will use all available CPUs on your Computer, while still doing the same computation as before.

Airdrome answered 21/3, 2017 at 17:52 Comment(3)

Would you pass the n_jobs parameter to the OVR Classifier or to the Bagging Classifier? – Sufi 5/1, 2021 at 14:48

The top-level, in this case OvR – Airdrome 8/1, 2021 at 11:12

For my SVM, it only could train when n_jobs = 1. For other numbers it just seemed to be stuck there. Is it because i am using images as my input data ? – Ghostly 15/11, 2022 at 2:22

For large datasets consider using LinearSVC or SGDClassifier instead, possibly after a Nystroem transformer.

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

Rhythmandblues answered 2/1, 2021 at 20:15 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags