Post-process classifier output in scikit learn Pipeline
Asked Answered
B

3

11

I am using a Pipeline in scikit learn to group some preprocessing together with a OneClassSVM as the final classifier. To compute reasonable metrics, I need a post-processing which transforms the -1,1 output of the OneClassSVM to 0 and 1. Is there any structured way to add such post-processing to a Pipeline? Transformers cannot be used after the final estimator.

Binding answered 6/11, 2015 at 14:6 Comment(3)
You can use second pipeline on top of first :)Pontus
@Olologin really: Because the first pipeline will not implement transform in its last step.Binding
But you should somehow make transformer from first pipeline. Because if it has predictor as last estimator - all pipeline becomes a predictor. I think it's better to inherit Pipeline and extend it with your custom functionality. After all possibility of such OOP tricks is the main benefit of scikit-learn.Pontus
K
2

We developed PipeGraph, an extension to Scikit-Learn Pipeline that allows you to get intermediate data, build graph like workflows, and in particular, solve this problem (see the examples in the gallery at http://mcasl.github.io/PipeGraph )

Kelly answered 18/2, 2018 at 22:19 Comment(7)
Thanks. This somehow provides a solution to the problem. So I can accept this as an answer.Binding
Can you explain a bit more on how you solve the problem to postprocess outputs from you model?Elasmobranch
In a standard pipeline all but the last step are transformers while the last is the predictor. In pipegraph there is no such restriction so that you can put further steps after the predictor to postprocess the outputs. In fact, as a graph structure, this can happen not only after or before, it can happen in parallel branches.Girish
Is ther other libraries to do such connections between estimators inside of a pipeline ? I tried PipeGraoh but it did not work with LBMIdellaidelle
@Idellaidelle If you add a toy example may be I can help you out and detect the reason why it is not working for you.Girish
@ManuelCastejónLimas I opened an issue directly on your github few days ago github.com/mcasl/PipeGraph/issues/4Idellaidelle
Great, I will investigate the possible cause.Girish
T
5

You can use the class sklearn.preprocessing.TransformedTargetRegressor with your SVM classifier as the regressor and use the inverse_func argument to transform your labels after classification.

However, since TransformedTargetRegressor is supposed to transform your labels to a new space before fitting and remap the predicted ones to the original space, it expects an array of labels to transform before fitting and does not accept an empty or None target as input. Therefore, you need to provide a dummy target to your pipeline, which can make your code a bit confusing.

Example:

import numpy as np
from sklearn.compose import TransformedTargetRegressor
from sklearn.svm import OneClassSVM
from sklearn.pipeline import Pipeline

X = np.random.random((10, 2))

regressor = OneClassSVM(gamma='auto')
svm = TransformedTargetRegressor(regressor=regressor,
    inverse_func=lambda x: (x+1)//2, # Function that remaps your labels
    check_inverse=False) # If not set to False, this code will generate an error since the provided inverse_func is not the inverse of the default func argument, which is the identity function

pipeline = Pipeline([
    ('svm', svm)
])

pipeline.fit(X, np.zeros((1,1))) # An array of fake label is provided to the pipeline
pipeline.predict(X)

Output:

array([[0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [0],
       [0],
       [0]])

Note that if you need to pass parameters to your OneClassSVM classifier via the Pipeline with a dictionary, for instance in a grid search with GridSearchCV, you need to add regressor__ to your parameter key name between svm__ and your parameter name. For instance, svm__kernel becomes svm__regressor__kernel.

Truncation answered 3/3, 2019 at 16:5 Comment(0)
K
2

We developed PipeGraph, an extension to Scikit-Learn Pipeline that allows you to get intermediate data, build graph like workflows, and in particular, solve this problem (see the examples in the gallery at http://mcasl.github.io/PipeGraph )

Kelly answered 18/2, 2018 at 22:19 Comment(7)
Thanks. This somehow provides a solution to the problem. So I can accept this as an answer.Binding
Can you explain a bit more on how you solve the problem to postprocess outputs from you model?Elasmobranch
In a standard pipeline all but the last step are transformers while the last is the predictor. In pipegraph there is no such restriction so that you can put further steps after the predictor to postprocess the outputs. In fact, as a graph structure, this can happen not only after or before, it can happen in parallel branches.Girish
Is ther other libraries to do such connections between estimators inside of a pipeline ? I tried PipeGraoh but it did not work with LBMIdellaidelle
@Idellaidelle If you add a toy example may be I can help you out and detect the reason why it is not working for you.Girish
@ManuelCastejónLimas I opened an issue directly on your github few days ago github.com/mcasl/PipeGraph/issues/4Idellaidelle
Great, I will investigate the possible cause.Girish
T
0

2 more ways to consider:

(1) Create a wrapper classifier of OneClassSVM. Inside the wrapper classifier's predict function you call the predict of OneClassSVM, and before return, do the transformation. See the link below for a template of classifier: https://scikit-learn.org/stable/developers/develop.html

(2) Create a simple classifier to do the transformation, and then chain OneClassSVM and the simple classifier together using StackingClassifier: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingClassifier.html

Tongs answered 18/5, 2020 at 2:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.