Optuna Suggests the Same Parameter Values in a lot of Trials (Duplicate Trials that Waste Time and Budget)
Asked Answered
J

3

8

Optuna TPESampler and RandomSampler try the same suggested integer values (possible floats and loguniforms as well) for any parameter more than once for some reason. I couldn't find a way to stop it from suggesting same values over over again. Out of 100 trials quite a few of them are just duplicates. Unique suggested value count ends up around 80-90 out of 100 trials. If I include more parameters for tuning, say 3, I even see all 3 of them getting the same values a few times in 100 trials.

It's like this. 75 for min_data_in_leaf was used 3 times:

[I 2020-11-14 14:44:05,320] Trial 8 finished with value: 45910.54012028659 and parameters: {'min_data_in_leaf': 75}. Best is trial 4 with value: 45805.19030897498.

[I 2020-11-14 14:44:07,876] Trial 9 finished with value: 45910.54012028659 and parameters: {'min_data_in_leaf': 75}. Best is trial 4 with value: 45805.19030897498.

[I 2020-11-14 14:44:10,447] Trial 10 finished with value: 45831.75933279074 and parameters: {'min_data_in_leaf': 43}. Best is trial 4 with value: 45805.19030897498.

[I 2020-11-14 14:44:13,502] Trial 11 finished with value: 46125.39810101329 and parameters: {'min_data_in_leaf': 4}. Best is trial 4 with value: 45805.19030897498.

[I 2020-11-14 14:44:16,547] Trial 12 finished with value: 45910.54012028659 and parameters: {'min_data_in_leaf': 75}. Best is trial 4 with value: 45805.19030897498.

Example code below:

def lgb_optuna(trial):

    rmse = []

    params = {
        "seed": 42,
        "objective": "regression",
        "metric": "rmse",
        "verbosity": -1,
        "boosting": "gbdt",
        "num_iterations":  1000,
        'min_data_in_leaf':  trial.suggest_int('min_data_in_leaf', 1, 100)
    }

    cv = StratifiedKFold(n_splits=5, random_state=42, shuffle=False)
    for train_index, test_index in cv.split(tfd_train, tfd_train[:,-1]):
        X_train, X_test = tfd_train[train_index], tfd_train[test_index]
        y_train = X_train[:,-2].copy()
        y_test = X_test[:,-2].copy()
        
        dtrain = lgb.Dataset(X_train[:,:-2], label=y_train)
        dtest = lgb.Dataset(X_test[:,:-2], label=y_test)
    
        booster_gbm = lgb.train(params, dtrain, valid_sets=dtest, verbose_eval=False)

        y_predictions = booster_gbm.predict(X_test[:,:-2])
        final_mse = mean_squared_error(y_test, y_predictions)
        final_rmse = np.sqrt(final_mse)
        rmse.append(final_rmse)

     return np.mean(rmse)

study = optuna.create_study(sampler=TPESampler(seed=42), direction='minimize') 
study.optimize(lgb_optuna, n_trials=100) 
Jimenez answered 14/11, 2020 at 16:33 Comment(0)
I
7

The problem is your sampler specified in this line:

study = optuna.create_study(sampler=TPESampler(seed=42), direction='minimize')

TPESampler is not a uniform sampler. It's a different sampler that tries to sample from promising range of values. See details here and here. That's the reason why you are seeing a lot of duplicates. For the optimizator they are promising values, and then they are explored further, maybe in different combinations.

To make a real uniform sampling, you should change your sampler to:

sampler=RandomSampler(seed)

This will not assure you that there will be no duplicates, but the values will be more equally distributed

If you want to ensure that you search for only different combinations, you should use GridSampler. As stated from the doc:

the trials suggest all combinations of parameters in the given search space during the study.

doc here

Iphigenia answered 14/11, 2020 at 17:44 Comment(5)
Like I described in the beginning of OP, RandomSampler also more or less has the same amount of duplicates. I want to use TPESampler since it supposedly finds the optimum hyperparameters quicker compared to random or grid search given the same budget/time. Any workarounds to deny it from using the previously used parameter values (e.g:searching for a parameter value in trial history first) or it's simply how TPESampler and RandomSampler operate?Jimenez
@LordPermaximum anyway the random sampler is quite different from the TPESampler. As I can see from the result of an experiment, the values are more or less equally distributedIphigenia
I know GridSampler works but it's time consuming. That's why I wanted to use TPESampler in the first place. There must be a way to use tpe or some other bayesian optimization without duplicating previous trials. It sounds so simple but I don't want to mess with base classes of the optuna module. Besides, I don't think TPE sampling is the problem here. I suspect TPESampler is using Random Sampling for a few trials for a startup. Say 40 or something. Then it switches to tpe sampling.Jimenez
UPDATE: This is an acknowledged problem with the Optuna that's yet to be solved. The above answer has nothing to do with it. github.com/optuna/optuna/issues/2021Jimenez
@LordPermaximum, as stated in my answer this is not a bug, but the natural behaviour of TPEsampler. As stated in the issue mentioned by you above: "By its nature, TPESampler tends to sample similar values with the number of trials increasing since it narrows the search space based on Bayesian optimization with the number of trials increasing. So I think it's a bit hard to avoid the same values suggested." It is possible to change this behaviour, but I don't think my answer is wrong, in the sense that this algorithm works in this way (for now)Iphigenia
S
2

I have my objective function check study.trials_dataframe() if these parameters have been run before and then just return study.trials_dataframe().value if they have.

Schauer answered 12/4, 2021 at 7:13 Comment(0)
C
0

As mentioned I would also suggest to make experiments with different samplers and their hyperparameters e.g. TPESampler (seed=i,multivariate=True). Hyperparameters sometimes improve the optimization, time-wise and even the final outcome. Try independent and relative sampling as well. In addition, try modifying the search space, optimize in search blocks. Design your optimization experiment so that you can gradually narrow down the search space around what should be the minimum (in case RMSE is your evaluation metric). In large search space the algorithm may be able to escape local minima, this may be one reason why it does not suggest you new values. modifying learning rate may also help to escape local minima. Hope I helped, good luck.

Canaletto answered 31/5 at 21:0 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Jahdol

© 2022 - 2024 — McMap. All rights reserved.