I am using a XGBClassifier and try to do a grid search in order to tune some parameters, and I get this warning : WARNING: ../src/learner.cc:1517: Empty dataset at worker: 0 whenever I launch the command xgb.cv(). It does not work either when I try this on default parameters.
Can anybody help? I am kind of lost there!
Here is an example code that produces the warning when I launch it:
import pandas as pd
import xgboost as xgb
from xgboost import cv
import numpy as np
seed = 10
nfold = 10
X = pd.DataFrame(np.random.randint(0,100,size=(6, 8)), columns=list("ABCDEFGH"))
y = pd.Series(np.array(\[0,0,0,1,1,1\]), index = \[0,1,2,3,4,5\])
X_test = pd.DataFrame(np.random.randint(0,100,size=(2, 8)), columns=list("ABCDEFGH"))
X_test.index= \[6,7\]
y_test = pd.Series(np.array(\[0,1\]), index = \[6,7\])
dtrain = xgb.DMatrix(X, label=y)
dtest = xgb.DMatrix(X_test, label = y_test)
# Here, this part is just to confirm we get a better model: this model will probably overfit.
params = {'objective':'multi:softmax'}
params['eval_metric'] = "mlogloss"
params['num_class'] = np.unique(y).size # Count how many levels in the response variable.
num_boost_round = 999
default_model = xgb.train(
params,
dtrain,
num_boost_round=num_boost_round,
evals=[(dtest, "Test")],
early_stopping_rounds=10)
cv_results = xgb.cv(
params,
dtrain,
num_boost_round=num_boost_round,
seed=seed,
nfold=nfold,
metrics={'mlogloss'},
early_stopping_rounds=10)