Is there a way to use GridSearchCV or any other built-in sklearn function to find the best hyper-parameters for OneClassSVM classifier?
What I currently do, is perform the search myself using train/test split like this:
Gamma and nu values are defined as:
gammas = np.logspace(-9, 3, 13)
nus = np.linspace(0.01, 0.99, 99)
Function which explores all possible hyper-parameters and finds the best ones:
clf = OneClassSVM()
results = []
train_x = vectorizer.fit_transform(train_contents)
test_x = vectorizer.transform(test_contents)
for gamma in gammas:
for nu in nus:
clf.set_params(gamma=gamma, nu=nu)
clf.fit(train_x)
y_pred = clf.predict(test_x)
if 1. in y_pred: # Check if at least 1 review is predicted to be in the class
results.append(((gamma, nu), (accuracy_score(y_true, y_pred),
precision_score(y_true, y_pred),
recall_score(y_true, y_pred),
f1_score(y_true, y_pred),
roc_auc_score(y_true, y_pred),
))
)
# Determine and print the best parameter settings and their performance
print_best_parameters(results, best_parameters(results))
Results are stored in a list of tuples of form:
((gamma, nu)(accuracy_score, precision_score, recall_score, f1_score, roc_auc_score))
To find the best accuracy, f1, roc_auc scores and parameters I wrote my own function:
best_parameters(results)