Bayesian optimization for a Light GBM Model

Asked 8/5, 2019 at 14:48 Answered 19/1, 2021 at 15:30

python pandas bayesian hyperparameters lightgbm

I am able to successfully improve the performance of my XGBoost model through Bayesian optimization, but the best I can achieve through Bayesian optimization when using Light GBM (my preferred choice) is worse than what I was able to achieve by using it’s default hyper-parameters and following the standard early stopping approach.

When tuning via Bayesian optimization, I have been sure to include the algorithm’s default hyper-parameters in the search surface, for reference purposes.

The code below shows the RMSE from the Light GBM model with default hyper-parameters using seaborn’s diamonds dataframe as an example of my workings:

#pip install bayesian-optimization

import seaborn as sns
from sklearn.model_selection import train_test_split
import lightgbm as lgb
from bayes_opt import BayesianOptimization

df = sns.load_dataset('diamonds')

df["color"] = df["color"].astype('category')
df["color_cat"] = df["color"].cat.codes
df = df.drop(["color"],axis = 1)

df["cut"] = df["cut"].astype('category')
df["cut_cat"] = df["cut"].cat.codes
df = df.drop(["cut"],axis = 1)

df["clarity"] = df["clarity"].astype('category')
df["clarity_cat"] = df["clarity"].cat.codes
df = df.drop(["clarity"],axis = 1)

y = df['price']
X = df.drop(['price'], axis=1)

seed = 7
test_size = 0.3
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size,random_state=seed)

train_lgb = lgb.Dataset(X_train, y_train)
eval_lgb = lgb.Dataset(X_test, y_test, reference = train_lgb)

params = { 'objective': 'regression',
  'metric': 'RMSE',
  'learning_rate': 0.02}
lgb_reg = lgb.train(params, train_lgb, num_boost_round = 10000, early_stopping_rounds=50, verbose_eval = 100, valid_sets=eval_lgb)

Results

OUT:
Training until validation scores don't improve for 50 rounds.
Early stopping, best iteration is:
[1330 (n_estimators)] valid_0's rmse: 538.728

Here my attempt to implement Bayesian Optimization and the resulting RMSE values:

def modelFitter(colsampleByTree, subsample,maxDepth, num_leaves):   
    model = lgb.LGBMRegressor(learning_rate=0.02, n_estimators=10000, max_depth=maxDepth.astype("int32"), subsample=subsample, colsample_bytree=colsampleByTree,num_leaves=num_leaves.astype("int32"))

    evalSet  = [(X_test, y_test)]
    model.fit(X_train, y_train, eval_metric="rmse", eval_set=evalSet, early_stopping_rounds=50, verbose=False)

    bestScore = model.best_score_[list(model.best_score_.keys())[0]]['rmse']

    return -bestScore

# Bounded region of parameter space
pbounds = {'colsampleByTree': (0.8,1.0), 'subsample': (0.8,1.0), 'maxDepth': (2,5), 'num_leaves': (24, 45)}

optimizer = BayesianOptimization(
    f=modelFitter,
    pbounds=pbounds,
    random_state=1)

optimizer.maximize(init_points=5,n_iter=5)  #n_iter=bayesian, init_points=random

Results

iter    |  target   | colsam... | maxDepth  | num_le... | subsample |
-------------------------------------------------------------------------
|  1        | -548.7    |  0.8834   |  4.161    |  24.0     |  0.8605   |
|  2        | -642.4    |  0.8294   |  2.277    |  27.91    |  0.8691   |
|  3        | -583.5    |  0.8794   |  3.616    |  32.8     |  0.937    |
|  4        | -548.7    |  0.8409   |  4.634    |  24.58    |  0.9341   |
|  5        | -583.5    |  0.8835   |  3.676    |  26.95    |  0.8396   |
|  6        | -548.7    |  0.8625   |  4.395    |  24.29    |  0.8968   |
|  7        | -548.7    |  0.8435   |  4.603    |  24.42    |  0.9298   |
|  8        | -551.5    |  0.9271   |  4.266    |  24.11    |  0.8035   |
|  9        | -548.7    |  0.8      |  4.11     |  24.08    |  1.0      |
|  10       | -548.7    |  0.8      |  4.44     |  24.45    |  0.9924   |

The RMSE (-1 x “target”) generated during Bayesian optimization should be better than that generated by the default values of LightGBM but I cannot achieve a better RMSE (looking for better/higher than -538.728 achieved through the above mentioned “normal” early stopping process).

The maxDepth and num_leaves should be integers; it looks like there is an open ticket to enforce this (i.e. bringing in “ptypes”): https://github.com/fmfn/BayesianOptimization/pull/131/files

Is there a reason why the Bayesian optimization doesn't seem to find a better solution with LightGBM but it does with XGBoost?

Keg answered 8/5, 2019 at 14:48 Comment(8)

what's your coding related question? this looks like it belong to stats-exchange – Chandless 8/5, 2019 at 15:5

Hi @Chandless - Is there something in my above mentioned code that is resulting in the Bayesian optimization not working? – Keg 8/5, 2019 at 15:40

please answer my question. then I can answer yours :) – Chandless 8/5, 2019 at 15:41

@Chandless - I was hoping I had made some error in the code which could be corrected. – Keg 8/5, 2019 at 20:19

Hi @Chandless - I put this question on stats-exchange, but I got this from the moderators: "Put on hold as off-topic. This question appears to be off-topic because EITHER it is not about statistics, machine learning, data analysis, data mining, or data visualization, OR it focuses on programming, debugging, or performing routine operations within a statistical computing platform.". So it seems I am caught between the two. If you know the answer, please can you share? – Keg 9/5, 2019 at 8:42

yeah, I sincerely thought you might have better chance there. I don't have an answer for you because it requires high specialization and lots of free time to answer, a rare commodity around here. So you might have to wait a lot of time or post in codereview – Chandless 9/5, 2019 at 11:57

Did you try to fit a LGBMRegressor with default parameters and see the resulting metrics? The reason is that defaults for the native API (lgb.train) and the scikit-learn API (LGBMRegressor) might be different (they should not be, but I'm not sure authors provide any guarantees). Also , the default that you use in the native API is max_depth=-1, whereas your optimisation boundaries are different from that. Limiting the depth can lead to a different tree structure – Doersten 12/5, 2019 at 9:9

Hi @ Mykhailo Lisovyi - thanks for this - altering the max_depth within the pbounds works – Keg 13/5, 2019 at 13:22

This question belongs on stats.SE; I would encourage you to ask on the Meta over there why it's not relevant. It might be a little too broad, as there might be a few possible reasons for the difference.

1) Double check that the hyperparameter space you're optimizing across is consistent in both models. (pbounds params seems only defined in LGBM model right now)

2) If the range of the search space is too small, there could be a local maxima at the default value, which is usually a heuristic, rule-of-thumb "pretty good" set of default values to start.

3) Both are Gradient Boost Models, but they have different ways of determining the best split value. This implies that your solution space may possibly develop different optimal values from the perspectives of the algorithms, which can only do a best-guess from their given architecture, and if the function to optimize changes, you might develop, by chance, a best-solution at the default LGBM hyperparameter values.

4) If you're looking at an extremely sub-optimal search space, similar to looking at a small subspace, you'll end up with mediocre results at max, which fall significantly short of the default settings. (It's like looking for the peak of a mountain in the ocean, whereas default might be some local hill.)

Interdental answered 22/5, 2019 at 22:9 Comment(0)

For regression I managed to achieve improved results using the cv-function in the lightgbm package.

The "black box" function in my BayesianOptimization() returns the minimum of the l1-mean.

def black_box_lgbm():
    params = {...} #Your params here
    cv_results = lgb.cv(params, train_data, nfold=5, metrics='mae', verbose_eval = 200, stratified=False)
    return min(cv_results['l1-mean'])

after calling maximize() on the BayesianOptimization() and grabbing the result with the lowest l1-error I retrained a model and compared to defaults. This consistently results in lower MSE compared to defaults.

Jacaranda answered 19/1, 2021 at 15:30 Comment(0)

Recommended topics

Hot tags