What I am trying to do?
I am trying to use StratifiedKFold()
in GridSearchCV()
.
Then, what does confuse me?
When we use K Fold Cross Validation, we just pass the number of CV inside GridSearchCV()
like the following.
grid_search_m = GridSearchCV(rdm_forest_clf, param_grid, cv=5, scoring='f1', return_train_score=True, n_jobs=2)
Then, when I will need to use StratifiedKFold()
, I think the procedure should remain same. That is, set the number of splits only - StratifiedKFold(n_splits=5)
to cv
.
grid_search_m = GridSearchCV(rdm_forest_clf, param_grid, cv=StratifiedKFold(n_splits=5), scoring='f1', return_train_score=True, n_jobs=2)
But this answer says
whatever the cross validation strategy used, all that is needed is to provide the generator using the function split, as suggested:
kfolds = StratifiedKFold(5) clf = GridSearchCV(estimator, parameters, scoring=qwk, cv=kfolds.split(xtrain,ytrain)) clf.fit(xtrain, ytrain)
Moreover, one of the answers of this question also suggest to do this. This means, they suggest to call split function :StratifiedKFold(n_splits=5).split(xtrain,ytrain)
during using GridSearchCV()
. But, I have found that calling split()
and without calling split()
give me the same f1 score.
Hence, my questions
I do not understand why do we need to call
split()
function during Stratified K Fold as we do not need to do such type of things during K Fold CV.If
split()
function is called, howGridSearchCV()
will work asSplit()
function returns training and testing data set indices? That is, I want to know howGridSearchCV()
will use those indices?