Invalid classes inferred from unique values of `y`. Expected: [0 1 2 3 4 5], got [1 2 3 4 5 6]
Asked Answered
E

9

28

I've trained dataset using XGB Classifier, but I got this error in local. It worked on Colab and also my friends don't have any problem with same code. I don't know what that error means...

Invalid classes inferred from unique values of y. Expected: [0 1 2 3 4 5], got [1 2 3 4 5 6]

this is my code, but I guess it's not the reason.

start_time = time.time()
xgb = XGBClassifier(n_estimators = 400, learning_rate = 0.1, max_depth = 3)
xgb.fit(X_train.values, y_train)
print('Fit time : ', time.time() - start_time)
Enyo answered 25/4, 2022 at 8:32 Comment(3)
how are you creating the y_train and y_test vectors? it looks like one of them is starting numeration at 1 and the other t 0.Krimmer
You can also, transform your variables into a one-hot encoded representation.Aldin
You must use the same LabelEncoder to encode the target for training and evaluation datasets: xgboosting.com/…Trireme
L
48

That happens because the class column has to start from 0 (as required since version 1.3.2). An easy way to solve that is using LabelEncoder from sklearn.preprocssing library.

Solution (works for version 1.6):

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_train = le.fit_transform(y_train)

And then you try/run your code again:

start_time = time.time()
xgb = XGBClassifier(n_estimators = 400, learning_rate = 0.1, max_depth = 3)
xgb.fit(X_train.values, y_train)
print('Fit time : ', time.time() - start_time)
Lyingin answered 5/5, 2022 at 19:23 Comment(0)
U
13

It's because the y_train must be encoded in a newer update XGBoost model before training it, i.e., you must use some categorical transformation like label encoders:

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_train = le.fit_transform(y_train)

Then apply it to XGBoost model for training:

from xgboost import XGBClassifier
classifier = XGBClassifier()
classifier.fit(X = X_train,y =  y_train)

After training to find out its confusion matrix you must inverse transform the predicted y values, as shown:

from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(X_test)
y_pred = le.inverse_transform(y_pred)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)
Usance answered 16/10, 2022 at 5:44 Comment(0)
C
4

Try to adding stratify to the train_test_split code:

X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=test_size, stratify = labels)
Clift answered 8/6, 2022 at 16:35 Comment(1)
It's not the case. Using LabelEncoder works fine.Dissyllable
L
3

The erros comes with the new version of xgboost, Uninstall current Xgboost and install xgboost 0.90

pip uninstall xgboost 

pip install xgboost==0.90
Lawman answered 2/5, 2022 at 9:35 Comment(0)
S
2

Downgrading to 1.5.0 worked for me

Also got this warning message during execution

UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release.

using the Label encoder in 1.6 returns this error for me:

MultiClassEvaluation: label must be in [0, num_class), num_class=6 but found 6 in label

Striptease answered 20/10, 2022 at 17:54 Comment(0)
D
1

If it helps, i just rolled back to version 1.2.1

Duggan answered 15/9, 2022 at 23:57 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Yorke
F
0

Use python version 3.7 as used in colab

Farquhar answered 25/6, 2022 at 19:16 Comment(0)
E
0

it happens because the version of ur xgboost , so :

try this :

y_train_xgb = y_train.map({"1": 0, "2": 1, "3": 2}
Entresol answered 17/10, 2023 at 19:27 Comment(0)
T
0

I verified in the source code of xgboost that LabelEncoder() was deprecated in version 1.3 with this PR:

https://github.com/dmlc/xgboost/pull/6269/files

And then LabelEncoder() was removed in version 1.6.0 with this PR: https://github.com/dmlc/xgboost/pull/7357

which was then merged here: https://github.com/dmlc/xgboost/commit/3c4aa9b2ead21d11ef1589059db2ea50208c55ea

The approach mentioned by @jefferson-santos to explicitly use LabelEncoder() is correct, and worked for me.

Throb answered 26/3, 2024 at 3:46 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.