Hyperopt: Optimal parameter changing with rerun
Asked Answered
S

1

5

I am trying to use Bayesian optimization (Hyperopt) for obtaining optimal parameters for SVM algorithm. However, I find that the optimal parameters are changing with every run.

Provided below is a simple reproducible case. Can you please throw some light into this?

import numpy as np 
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials

from sklearn.svm import SVC
from sklearn import svm, datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.model_selection import StratifiedShuffleSplit

iris = datasets.load_iris()
X = iris.data[:, :2] 
y = iris.target

def hyperopt_train_test(params):
    clf = svm.SVC(**params)
    return cross_val_score(clf, X, y).mean()

space4svm = {
    'C': hp.loguniform('C', -3, 3),
    'gamma': hp.loguniform('gamma', -3, 3),
}

def f(params):
    acc = hyperopt_train_test(params)
    return {'loss': -acc, 'status': STATUS_OK}

trials = Trials()

best = fmin(f, space4svm, algo=tpe.suggest, max_evals=1000, trials=trials)

print ('best:')
print (best)

Following are some of the optimal values.

best: {'C': 0.08776548401545513, 'gamma': 1.447360198193232}

best: {'C': 0.23621788050791617, 'gamma': 1.2467882092108042}

best: {'C': 0.3134163250819116, 'gamma': 1.0984778155489887}

Squat answered 15/12, 2018 at 14:33 Comment(0)
I
7

Thats because the during the execution of fmin, hyperopt is drawing out different values of 'C' and 'gamma' from the defined search space space4cvm randomly during each run of the program.

To fix this and produce deterministic results, you need to use the 'rstate' param of fmin:

rstate :

    numpy.RandomState, default numpy.random or `$HYPEROPT_FMIN_SEED`

    Each call to `algo` requires a seed value, which should be different
    on each call. This object is used to draw these seeds via `randint`.
    The default rstate is numpy.random.RandomState(int(env['HYPEROPT_FMIN_SEED']))
    if the 'HYPEROPT_FMIN_SEED' environment variable is set to a non-empty
    string, otherwise np.random is used in whatever state it is in.

So if not set explicitly, by default it will check if the environment variable 'HYPEROPT_FMIN_SEED' is set or not. If not, then it will use a random number each time.

You can use this by :

rstate = np.random.RandomState(42)   #<== Use any number here but fixed

best = fmin(f, space4svm, algo=tpe.suggest, max_evals=100, trials=trials, rstate=rstate)
Insecurity answered 18/12, 2018 at 11:4 Comment(1)
I changed the code "rstate = np.random.default_rng(42)" to get it to work. It might be a versioning thing (hyperopt==0.2.7).Donne

© 2022 - 2024 — McMap. All rights reserved.