Well I'm trying to understand how and at which point in an algorithm to apply the Kfold CV and GridSearchCV. Also if i understand correctly GridSearchCV is used for hyperparameter tuning i.e. what values of the arguments will give best result and the Kfold CV is used to better the generalization so that we are training like on different folds and hence reducing bias if the data is like ordered in some particular way and hence increasing generalization. Now the question is, isn't GridSearchCV doing the cross validation too with CV parameter. So why do we require Kfold CV, and if we do whether we do it before GridSearchCV? A little outline of the process would be extremely helpful.
GridSearchCV
is a higher-level construct than KFold
. The former uses the latter (or others like it).
KFold
is a relatively low-level construct that gives you a sequence of train/test indices. You can use these indices to do several things, including finding the OOB performance of a model, and/or tuning hyperparameters (which basically searches somehow for hyperparameters based on OOB performance).
GridSearchCV
is a higher-level construct, that takes a CV engine like KFold
(in its cv
argument). It uses the CV engine to search over hyperparameters (in this case, using grid search over the parameters).
KFold
on a pipeline which has a step that is running GridSearchCV
, but, even then, at each level, you'd be running only one of them. –
Pyretic Grid Search is used to choose best combination of Hyper parameters of predictive algorithms (Tuning the hyper-parameters of an estimator) whereas KFold Provides train/test indices to split data in train/test sets. It Split dataset into k consecutive folds (without shuffling by default).
Each fold is then used once as a validation while the k - 1 remaining folds form the training set. It's used to get better measure of prediction accuracy (which we can use as a proxy for goodness of fit of the model).
© 2022 - 2024 — McMap. All rights reserved.