return coefficients from Pipeline object in sklearn

Asked 8/5, 2017 at 19:56 Answered 28/10, 2022 at 14:46

Solved python machine-learning scikit-learn cross-validation scikit-learn-pipeline

I've fit a Pipeline object with RandomizedSearchCV

pipe_sgd = Pipeline([('scl', StandardScaler()),
                    ('clf', SGDClassifier(n_jobs=-1))])

param_dist_sgd = {'clf__loss': ['log'],
                 'clf__penalty': [None, 'l1', 'l2', 'elasticnet'],
                 'clf__alpha': np.linspace(0.15, 0.35),
                 'clf__n_iter': [3, 5, 7]}

sgd_randomized_pipe = RandomizedSearchCV(estimator = pipe_sgd, 
                                         param_distributions=param_dist_sgd, 
                                         cv=3, n_iter=30, n_jobs=-1)

sgd_randomized_pipe.fit(X_train, y_train)

I want to access the coef_ attribute of the best_estimator_ but I'm unable to do that. I've tried accessing coef_ with the code below.

sgd_randomized_pipe.best_estimator_.coef_

However I get the following AttributeError...

AttributeError: 'Pipeline' object has no attribute 'coef_'

The scikit-learn docs say that coef_ is an attribute of SGDClassifier, which is the class of my base_estimator_.

What am I doing wrong?

Ossieossietzky answered 8/5, 2017 at 19:56 Comment(0)

You can always use the names you assigned to them while making the pipeline by using the named_steps dict.

scaler = sgd_randomized_pipe.best_estimator_.named_steps['scl']
classifier = sgd_randomized_pipe.best_estimator_.named_steps['clf']

and then access all the attributes like coef_, intercept_ etc. which are available to corresponding fitted estimator.

This is the formal attribute exposed by the Pipeline as specified in the documentation:

named_steps : dict

Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.

Mcnamara answered 9/5, 2017 at 2:11 Comment(0)

I think this should work:

sgd_randomized_pipe.named_steps['clf'].coef_

Raeraeann answered 21/10, 2018 at 1:31 Comment(0)

I've found one way to do this is by chained indexing with the steps attribute...

sgd_randomized_pipe.best_estimator_.steps[1][1].coef_

Is this best practice, or is there another way?

Ossieossietzky answered 8/5, 2017 at 20:8 Comment(2)

The named_steps method describe above is preferred – Farant 3/5, 2018 at 9:47

This worked well when using make_pipeline with many different classifiers! – Inextricable 20/12, 2021 at 12:5

In short, in scikit-learn there are two ways to access the estimators chained together in a Pipline: either retrieved by index or retrieved by name. (And each way again has two flavours, i.e. directly vs. indirectly.)

Firstly, as the User Guide of sklearn points out,

The Pipline is built using a list of (key, value) pairs (i.e. steps), where the key is a string containing the name you want to give this step and value is an estimator object.

Which indicates that:

a pipline is constructed by one or multiple estimator objects, in order. (just like a list)

>>> from sklearn.pipeline import Pipeline
>>> from sklearn.svm import SVC
>>> from sklearn.decomposition import PCA
>>> estimators = [('reduce_dim', PCA()), ('clf', SVC())]
>>> pipe = Pipeline(estimators)
>>> pipe
Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC())])

and each estimator object has a name, either appointed by the user (with the key) or automatically set (e.g. by using make_pipeline utility function)

>>> from sklearn.pipeline import make_pipeline
>>> pipe = make_pipeline(PCA(), SVC())
>>> pipe
Pipeline(steps=[('pca', PCA()), ('svc', SVC())])

So finaly, we can access the estimators in a Pipline either

by indexing the Pipline:

directly through the Pipline object (just like a list)
```
>>> pipe[0]
PCA()
>>> pipe[1]
SVC()
```

indirectly through the steps attribute (actually a list of tuple)

>>> pipe.steps
[('pca', PCA()), ('svc', SVC())]
>>> pipe.steps[0][1]
PCA()
>>> pipe.steps[1][1]
SVC()

or by the name of steps/estimators:

directly through Pipline object (just like a dict or namedtyple)
```
>>> pipe["pca"]
PCA()
>>> pipe["svc"]
SVC()
```

indirectly through the named_steps attribute (actually a subclass of dict)

>>> pipe.named_steps
{'pca': PCA(), 'svc': SVC()}
>>> pipe.named_steps["pca"]
PCA()
>>> pipe.named_steps["svc"]
SVC()

From here on, I hope we could play around the piplines like a skilled plumber.

Elison answered 28/10, 2022 at 14:46 Comment(0)

Recommended topics

Hot tags