Using the code below for svm in python:
from sklearn import datasets
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, class_weight='auto'))
clf.fit(X, y)
proba = clf.predict_proba(X)
But it is taking a huge amount of time.
Actual Data Dimensions:
train-set (1422392,29)
test-set (233081,29)
How can I speed it up(parallel or some other way)? Please help. I have already tried PCA and downsampling.
I have 6 classes. Edit: Found http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html but I wish for probability estimates and it seems not to so for svm.
Edit:
from sklearn import datasets
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC,LinearSVC
from sklearn.linear_model import SGDClassifier
import joblib
import numpy as np
from sklearn import grid_search
import multiprocessing
import numpy as np
import math
def new_func(a): #converts array(x) elements to (1/(1 + e(-x)))
a=1/(1 + math.exp(-a))
return a
if __name__ == '__main__':
iris = datasets.load_iris()
cores=multiprocessing.cpu_count()-2
X, y = iris.data, iris.target #loading dataset
C_range = 10.0 ** np.arange(-4, 4); #c value range
param_grid = dict(estimator__C=C_range.tolist())
svr = OneVsRestClassifier(LinearSVC(class_weight='auto'),n_jobs=cores) ################LinearSVC Code faster
#svr = OneVsRestClassifier(SVC(kernel='linear', probability=True, ##################SVC code slow
# class_weight='auto'),n_jobs=cores)
clf = grid_search.GridSearchCV(svr, param_grid,n_jobs=cores,verbose=2) #grid search
clf.fit(X, y) #training svm model
decisions=clf.decision_function(X) #outputs decision functions
#prob=clf.predict_proba(X) #only for SVC outputs probablilites
print decisions[:5,:]
vecfunc = np.vectorize(new_func)
prob=vecfunc(decisions) #converts deicision to (1/(1 + e(-x)))
print prob[:5,:]
Edit 2: The answer by user3914041 yields very poor probability estimates.
OneVsRestClassifier
in parallel. I allocated it 14 cores but it seemed to use only 6 of them (which is equal to the number of classes). Any reason for this you know, I am unsure how parallel gradient works. If I cannot run on more than the number of classes, I don't think using cluster would be of help.(Also, I have already have more than enough RAM ~48GB on desktop. So there is no problem of memory.) – Sierackisklearn.SVC
because of the probability estimates then it seems to me like your best bet might be to downsample, PCA, and use OneVsRest with 6 jobs. – Magistral1/1+exp(Ax+B)
, whereA
andB
are parameters learned by ML estimate. Can you help how to implement it. I can't seem to find a starting point. – Sieracki