Kfold Cross Validation and GridSearchCV

Asked 19/3, 2018 at 7:3 Answered 19/3, 2018 at 7:29

machine-learning scikit-learn cross-validation hyperparameters

Well I'm trying to understand how and at which point in an algorithm to apply the Kfold CV and GridSearchCV. Also if i understand correctly GridSearchCV is used for hyperparameter tuning i.e. what values of the arguments will give best result and the Kfold CV is used to better the generalization so that we are training like on different folds and hence reducing bias if the data is like ordered in some particular way and hence increasing generalization. Now the question is, isn't GridSearchCV doing the cross validation too with CV parameter. So why do we require Kfold CV, and if we do whether we do it before GridSearchCV? A little outline of the process would be extremely helpful.

Propound answered 19/3, 2018 at 7:3 Comment(0)

GridSearchCV is a higher-level construct than KFold. The former uses the latter (or others like it).

KFold is a relatively low-level construct that gives you a sequence of train/test indices. You can use these indices to do several things, including finding the OOB performance of a model, and/or tuning hyperparameters (which basically searches somehow for hyperparameters based on OOB performance).

GridSearchCV is a higher-level construct, that takes a CV engine like KFold (in its cv argument). It uses the CV engine to search over hyperparameters (in this case, using grid search over the parameters).

Pyretic answered 19/3, 2018 at 7:13 Comment(2)

Thanks for explaining the difference. Also i would like to ask one further question since gridsearchCV does use k-fold or others like it and directly gives the best value of parameters and the same result as obtained from gridsearchCV can be obtained with k-fold using loops for parameter determination for a given generalization performance, so it's more like a either-or situation right? I mean we can use either of them or does both have to be used. – Propound 19/3, 2018 at 7:38

Very roughly, I also think that it is an either/or situation. One exception would be nested cross-validation, where you'd run externally KFold on a pipeline which has a step that is running GridSearchCV, but, even then, at each level, you'd be running only one of them. – Pyretic 19/3, 2018 at 7:43

Grid Search is used to choose best combination of Hyper parameters of predictive algorithms (Tuning the hyper-parameters of an estimator) whereas KFold Provides train/test indices to split data in train/test sets. It Split dataset into k consecutive folds (without shuffling by default).

Each fold is then used once as a validation while the k - 1 remaining folds form the training set. It's used to get better measure of prediction accuracy (which we can use as a proxy for goodness of fit of the model).

Supermarket answered 19/3, 2018 at 7:29 Comment(0)

Recommended topics

Hot tags