10*10 fold cross validation in scikit-learn?

About

Asked 26/11, 2011 at 19:36 Answered 26/11, 2011 at 20:5

Solved python machine-learning scikits scikit-learn

class sklearn.cross_validation.ShuffleSplit(
    n, 
    n_iterations=10, 
    test_fraction=0.10000000000000001, 
    indices=True, 
    random_state=None
)

the right way for 10*10fold CV in scikit-learn? (By changing the random_state to 10 different numbers)

Because I didn't find any random_state parameter in Stratified K-Fold or K-Fold and the separate from K-Fold are always identical for the same data.

If ShuffleSplit is the right, one concern is that it is mentioned

Note: contrary to other cross-validation strategies, random splits do not guarantee that all folds will be different, although this is still very likely for sizeable datasets

Is this always the case for 10*10 fold CV?

Director answered 26/11, 2011 at 19:36 Comment(0)

I am not sure what you mean by 10*10 cross validation. The ShuffleSplit configuration you give will make you call the fit method of the estimator 10 times. If you call this 10 times by explicitly using an outer loop or directly call it 100 times with 10% of the data reserved for testing in a single loop if you use instead:

>>> ss = ShuffleSplit(X.shape[0], n_iterations=100, test_fraction=0.1,
...     random_state=42)

If you want to do 10 runs of StratifiedKFold with k=10 you can shuffle the dataset between the runs (that would lead to a total 100 calls to the fit method with a 90% train / 10% test split for each call to fit):

>>> from sklearn.utils import shuffle
>>> from sklearn.cross_validation import StratifiedKFold, cross_val_score
>>> for i in range(10):
...    X, y = shuffle(X_orig, y_orig, random_state=i)
...    skf = StratifiedKFold(y, 10)
...    print cross_val_score(clf, X, y, cv=skf)

Blindstory answered 26/11, 2011 at 20:5 Comment(4)

Thanks, it's exactly what I was looking for. BTW, I saw 42 many times in examples on the web page, any story for that? – Director 26/11, 2011 at 20:11

You are asking the wrong question :) en.wikipedia.org/wiki/… – Blindstory 26/11, 2011 at 21:55

More seriously, in the examples and tests we want to have reproducible outcomes hence we fix the PRNG seed to an arbitrary value. Feel free to tweak the value, the outcome should still "look good" but sometimes slightly different (some algorithms have a non convex objective functions with several good local optima). – Blindstory 27/11, 2011 at 14:56

@Blindstory Hi. If I use a StratifiedShuffleSplit, do I still need the outer loop? I want to do a 10x10 SSS inside a Pipeline. – Hefner 6/11, 2016 at 21:7

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags