I wonder if we can set up an "optional" step in sklearn.pipeline
. For example, for a classification problem, I may want to try an ExtraTreesClassifier
with AND without a PCA
transformation ahead of it. In practice, it might be a pipeline with an extra parameter specifying the toggle of the PCA
step, so that I can optimize on it via GridSearch
and etc. I don't see such an implementation in sklearn source, but is there any work-around?
Furthermore, since the possible parameter values of a following step in pipeline might depend on the parameters in a previous step (e.g., valid values of ExtraTreesClassifier.max_features
depend on PCA.n_components
), is it possible to specify such a conditional dependency in sklearn.pipeline
and sklearn.grid_search
?
Thank you!
ExtraTreesClassifier.max_features
can be a float value between 0.0 and 1.0, instead of an integer value. This is useful when the actual number of features variable, as in your case. – Sultry