Multiclass Classification with LightGBM

Asked 18/11, 2017 at 19:39 Answered 30/8, 2019 at 1:52

Solved python machine-learning predict multiclass-classification lightgbm

I am trying to model a classifier for a multi-class Classification problem (3 Classes) using LightGBM in Python. I used the following parameters.

params = {'task': 'train',
    'boosting_type': 'gbdt',
    'objective': 'multiclass',
    'num_class':3,
    'metric': 'multi_logloss',
    'learning_rate': 0.002296,
    'max_depth': 7,
    'num_leaves': 17,
    'feature_fraction': 0.4,
    'bagging_fraction': 0.6,
    'bagging_freq': 17}

All the categorical features of the dataset is label encoded with LabelEncoder. I trained the model after running cv with eartly_stopping as shown below.

lgb_cv = lgbm.cv(params, d_train, num_boost_round=10000, nfold=3, shuffle=True, stratified=True, verbose_eval=20, early_stopping_rounds=100)

nround = lgb_cv['multi_logloss-mean'].index(np.min(lgb_cv['multi_logloss-mean']))
print(nround)

model = lgbm.train(params, d_train, num_boost_round=nround)

After training, I made prediction with model like this,

preds = model.predict(test)
print(preds)

I got a nested array as output like this.

[[  7.93856847e-06   9.99989550e-01   2.51164967e-06]
 [  7.26332978e-01   1.65316511e-05   2.73650491e-01]
 [  7.28564308e-01   8.36756769e-06   2.71427325e-01]
 ..., 
 [  7.26892634e-01   1.26915179e-05   2.73094674e-01]
 [  5.93217601e-01   2.07172044e-04   4.06575227e-01]
 [  5.91722491e-05   9.99883828e-01   5.69994435e-05]]

As each list in the preds represent the class probabilites I used np.argmax() to find the classes like this..

predictions = []

for x in preds:
    predictions.append(np.argmax(x))

While analyzing the prediction I found that my predictions contain only 2 classes - 0 and 1. Class 2 was the 2nd largest class in the training set, but it was nowhere to be found in the predictions.. On evaluating the result it gave about 78% accuracy.

So, why didn't my model predict class 2 for any of the cases.? Is there anything wrong in the parameters I used.?

Isn't this the proper way to make interpret prediction made by the model.? Should I make any changes for the parameters.??

Civies answered 18/11, 2017 at 19:39 Comment(2)

I don't know what is exactly wrong with this code but what I figured is that your problem is seems to be binary classification but you are using multi class classification metrics for accuracy. I would rather suggest you to use binary_logloss for your problem. you can find more regarding the same here – Eri 3/1, 2018 at 7:30

I have 3 classes in my target. I have cross cheked – Civies 3/1, 2018 at 16:13

Try troubleshooting by swapping classes 0 and 2, and re-running the trainining and prediction process.

If the new predictions only contain classes 1 and 2 (most likely given your provided data):

Classifier may not have learnt the third class; perhaps its features overlap with those of a larger class, and the classifier defaults to the larger class in order to minimise the objective function. Try providing a balanced training set (same number of samples per class) and retry.

If the new predictions do contain all 3 classes:

Something went wrong in your code somewhere. More information is needed to determine what exactly went wrong.

Hope this helps.

Designedly answered 13/6, 2018 at 2:52 Comment(0)

From the output you are providing there seems to be nothing wrong in the predictions.

The model produces three probabilities as you show and just from the first output you provided [ 7.93856847e-06 9.99989550e-01 2.51164967e-06] class 2 has a higher probability, so I can't see the problem here.

Class 0 is the first class, class 1 is actually class 2 the second class, 2 is the third class. So I guess nothing is wrong.

Eastwardly answered 13/4, 2018 at 19:51 Comment(1)

The model don't predict class 3 for any input samples even on the ones it was trained on.!! – Civies 14/4, 2018 at 14:36

-1

The solution is:

best_preds_svm = [np.argmax(line) for line in preds]

Then you can print the class which has the most reasonable result.

Bret answered 5/4, 2018 at 6:20 Comment(0)

-2

import pandas as pd

pd.DataFrame(preds).apply(lambda x: np.argmax(x), axis=1)

Acetic answered 30/8, 2019 at 1:52 Comment(0)

Recommended topics

Hot tags