Finding the values of C and gamma to optimise SVM
Asked Answered
W

1

5

I applied SVM (scikit-learn) in some dataset and wanted to find the values of C and gamma that can give the best accuracy for the test set.

I first fixed C to a some integer and then iterate over many values of gamma until I got the gamma which gave me the best test set accuracy for that C. And then I fixed this gamma which i got in the above step and iterate over values of C and find a C which can give me best accuracy and so on ...

But the above steps can never give the best combination of gamma and C that produce best test set accuracy.

Can anyone help me in finding a way out to get this combo (gamma,C) in sckit-learn ?

Winer answered 20/9, 2017 at 19:28 Comment(3)
Surely not ! bcz there will be a high chance that i will be stuck in local maximum and the combination of C and gamma will not give me the best accuracy.Winer
Did you try implementing it or are you just guessing it ? The grid search will try all possible combinations, hence it won't get stuck in local MaximaSicilia
@MohammedKashif I tried it but the process seams to be unending by fixing one and iterating over the other and doing this for the otherWiner
S
7

You are looking for Hyper-Parameter tuning. In parameter tuning we pass a dictionary containing a list of possible values for you classifier, then depending on the method that you choose (i.e. GridSearchCV, RandomSearch, etc.) the best possible parameters are returned. You can read more about it here.

As example :

#Create a dictionary of possible parameters
params_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100],
          'gamma': [0.0001, 0.001, 0.01, 0.1],
          'kernel':['linear','rbf'] }

#Create the GridSearchCV object
grid_clf = GridSearchCV(SVC(class_weight='balanced'), params_grid)

#Fit the data with the best possible parameters
grid_clf = clf.fit(X_train, y_train)

#Print the best estimator with it's parameters
print grid_clf.best_estimators

You can read more about GridSearchCV here and RandomizedSearchCV here. A word of caution though, SVM takes a lot of CPU power so be careful with the number of parameters you pass. It might take some time to process depending upon your data and the number of parameters you pass.

This link also contains an example as well

Sicilia answered 20/9, 2017 at 19:40 Comment(6)
you are creating a variable "params_grid" and using "params_grids". Please correct that. Also, this gives an error saying "'SVC' object has no attribute 'best_estimators'". Can you please provide complete code?Polo
@VipulSharma use clf.best_params_ (on the clf object)Chyle
Thanks for tha answer. After getting optimal parameters how can we verify that they are good? is it using X_test? Can we use cross validation instead? :)Coherent
@Emi you need to use X_test to test your classifier. If you want to use cross-validation, just specify the cv attribute in GridSearchCV.Sicilia
@Gambit thanks a lot :) btw please let me know if you know an answer for this. #55609839 thank you :)Coherent
@Gambit thanks a lot for the great answer. yes it is very helpful. Just a quick question. Is there a way to get the selected features from rfecv? Moreover, how can we validate X_test using the selected features? Looking forward to hearing from you. Thank you very much once again :)Coherent

© 2022 - 2024 — McMap. All rights reserved.