I'm trying to work my head around the example of Nested vs. Non-Nested CV in Sklearn. I checked multiple answers but I am still confused on the example. To my knowledge, a nested CV aims to use a different subset of data to select the best parameters of a classifier (e.g. C in SVM) and validate its performance. Therefore, from a dataset X, the outer 10-folds CV (for simplicity n=10) creates 10 training sets and 10 test sets:
(Tr0, Te0),..., (Tr0, Te9)
Then, the inner 10-CV splits EACH outer training set into 10 training and 10 test sets:
From Tr0: (Tr0_0,Te_0_0), ... , (Tr0_9,Te0_9)
From Tr9: (Tr9_0,Te_9_0), ... , (Tr9_9,Te9_9)
Now, using the inner CV, we can find the best values of C for every single outer Training set. This is done by testing all the possible values of C with the inner CV. The value providing the highest performance (e.g. accuracy) is chosen for that specific outer Training set. Finally, having discovered the best C values for every outer Training set, we can calculate an unbiased accuracy using the outer Test sets. With this procedure, the samples used to identify the best parameter (i.e. C) are not used to compute the performance of the classifier, hence we have a totally unbiased validation.
The example provided in the Sklearn page is:
inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)
# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svm, param_grid=p_grid, cv=inner_cv)
clf.fit(X_iris, y_iris)
non_nested_scores[i] = clf.best_score_
# Nested CV with parameter optimization
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)
nested_scores[i] = nested_score.mean()
From what I understand, the code simply calculates the scores using two different cross-validations (i.e. different splits into training and test set). Both of them used the entire dataset. The GridCV identifies the best parameters using one (of the two CVs), then cross_val_score calculates, with the second CV, the performance when using the best parameters.
Am I interpreting a Nested CV in the wrong way? What am I missing from the example?
X_tr
is splitted intoX_tr_tr
andX_tr_te
for number of folds defined ininner_cv
. Then according to theouter_cv
some other part of the data becomesX_tr
which is then again sent toinner_cv
. Hope it makes sense. – Soluble