I am using a Pipeline
in scikit learn to group some preprocessing together with a OneClassSVM
as the final classifier. To compute reasonable metrics, I need a post-processing which transforms the -1,1 output of the OneClassSVM
to 0 and 1. Is there any structured way to add such post-processing to a Pipeline
?
Transformers cannot be used after the final estimator.
We developed PipeGraph, an extension to Scikit-Learn Pipeline that allows you to get intermediate data, build graph like workflows, and in particular, solve this problem (see the examples in the gallery at http://mcasl.github.io/PipeGraph )
You can use the class sklearn.preprocessing.TransformedTargetRegressor
with your SVM classifier as the regressor and use the inverse_func
argument to transform your labels after classification.
However, since TransformedTargetRegressor
is supposed to transform your labels to a new space before fitting and remap the predicted ones to the original space, it expects an array of labels to transform before fitting and does not accept an empty or None
target as input. Therefore, you need to provide a dummy target to your pipeline, which can make your code a bit confusing.
Example:
import numpy as np
from sklearn.compose import TransformedTargetRegressor
from sklearn.svm import OneClassSVM
from sklearn.pipeline import Pipeline
X = np.random.random((10, 2))
regressor = OneClassSVM(gamma='auto')
svm = TransformedTargetRegressor(regressor=regressor,
inverse_func=lambda x: (x+1)//2, # Function that remaps your labels
check_inverse=False) # If not set to False, this code will generate an error since the provided inverse_func is not the inverse of the default func argument, which is the identity function
pipeline = Pipeline([
('svm', svm)
])
pipeline.fit(X, np.zeros((1,1))) # An array of fake label is provided to the pipeline
pipeline.predict(X)
Output:
array([[0],
[1],
[1],
[1],
[1],
[0],
[1],
[0],
[0],
[0]])
Note that if you need to pass parameters to your OneClassSVM
classifier via the Pipeline
with a dictionary, for instance in a grid search with GridSearchCV
, you need to add regressor__
to your parameter key name between svm__
and your parameter name. For instance, svm__kernel
becomes svm__regressor__kernel
.
We developed PipeGraph, an extension to Scikit-Learn Pipeline that allows you to get intermediate data, build graph like workflows, and in particular, solve this problem (see the examples in the gallery at http://mcasl.github.io/PipeGraph )
2 more ways to consider:
(1) Create a wrapper classifier of OneClassSVM. Inside the wrapper classifier's predict function you call the predict of OneClassSVM, and before return, do the transformation. See the link below for a template of classifier: https://scikit-learn.org/stable/developers/develop.html
(2) Create a simple classifier to do the transformation, and then chain OneClassSVM and the simple classifier together using StackingClassifier: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingClassifier.html
© 2022 - 2024 — McMap. All rights reserved.
transform
in its last step. – Binding