return coefficients from Pipeline object in sklearn
Asked Answered
O

4

43

I've fit a Pipeline object with RandomizedSearchCV

pipe_sgd = Pipeline([('scl', StandardScaler()),
                    ('clf', SGDClassifier(n_jobs=-1))])

param_dist_sgd = {'clf__loss': ['log'],
                 'clf__penalty': [None, 'l1', 'l2', 'elasticnet'],
                 'clf__alpha': np.linspace(0.15, 0.35),
                 'clf__n_iter': [3, 5, 7]}

sgd_randomized_pipe = RandomizedSearchCV(estimator = pipe_sgd, 
                                         param_distributions=param_dist_sgd, 
                                         cv=3, n_iter=30, n_jobs=-1)

sgd_randomized_pipe.fit(X_train, y_train)

I want to access the coef_ attribute of the best_estimator_ but I'm unable to do that. I've tried accessing coef_ with the code below.

sgd_randomized_pipe.best_estimator_.coef_

However I get the following AttributeError...

AttributeError: 'Pipeline' object has no attribute 'coef_'

The scikit-learn docs say that coef_ is an attribute of SGDClassifier, which is the class of my base_estimator_.

What am I doing wrong?

Ossieossietzky answered 8/5, 2017 at 19:56 Comment(0)
M
48

You can always use the names you assigned to them while making the pipeline by using the named_steps dict.

scaler = sgd_randomized_pipe.best_estimator_.named_steps['scl']
classifier = sgd_randomized_pipe.best_estimator_.named_steps['clf']

and then access all the attributes like coef_, intercept_ etc. which are available to corresponding fitted estimator.

This is the formal attribute exposed by the Pipeline as specified in the documentation:

named_steps : dict

Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.

Mcnamara answered 9/5, 2017 at 2:11 Comment(0)
R
14

I think this should work:

sgd_randomized_pipe.named_steps['clf'].coef_
Raeraeann answered 21/10, 2018 at 1:31 Comment(0)
O
4

I've found one way to do this is by chained indexing with the steps attribute...

sgd_randomized_pipe.best_estimator_.steps[1][1].coef_

Is this best practice, or is there another way?

Ossieossietzky answered 8/5, 2017 at 20:8 Comment(2)
The named_steps method describe above is preferredFarant
This worked well when using make_pipeline with many different classifiers!Inextricable
E
1

In short, in scikit-learn there are two ways to access the estimators chained together in a Pipline: either retrieved by index or retrieved by name. (And each way again has two flavours, i.e. directly vs. indirectly.)


Firstly, as the User Guide of sklearn points out,

The Pipline is built using a list of (key, value) pairs (i.e. steps), where the key is a string containing the name you want to give this step and value is an estimator object.

Which indicates that:

  1. a pipline is constructed by one or multiple estimator objects, in order. (just like a list)

    >>> from sklearn.pipeline import Pipeline
    >>> from sklearn.svm import SVC
    >>> from sklearn.decomposition import PCA
    >>> estimators = [('reduce_dim', PCA()), ('clf', SVC())]
    >>> pipe = Pipeline(estimators)
    >>> pipe
    Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC())])
    
  2. and each estimator object has a name, either appointed by the user (with the key) or automatically set (e.g. by using make_pipeline utility function)

    >>> from sklearn.pipeline import make_pipeline
    >>> pipe = make_pipeline(PCA(), SVC())
    >>> pipe
    Pipeline(steps=[('pca', PCA()), ('svc', SVC())])
    

So finaly, we can access the estimators in a Pipline either

  1. by indexing the Pipline:
    • directly through the Pipline object (just like a list)
      >>> pipe[0]
      PCA()
      >>> pipe[1]
      SVC()
      
    • indirectly through the steps attribute (actually a list of tuple)
      >>> pipe.steps
      [('pca', PCA()), ('svc', SVC())]
      >>> pipe.steps[0][1]
      PCA()
      >>> pipe.steps[1][1]
      SVC()
      
  2. or by the name of steps/estimators:
    • directly through Pipline object (just like a dict or namedtyple)
      >>> pipe["pca"]
      PCA()
      >>> pipe["svc"]
      SVC()
      
    • indirectly through the named_steps attribute (actually a subclass of dict)
      >>> pipe.named_steps
      {'pca': PCA(), 'svc': SVC()}
      >>> pipe.named_steps["pca"]
      PCA()
      >>> pipe.named_steps["svc"]
      SVC()
      

From here on, I hope we could play around the piplines like a skilled plumber.

Elison answered 28/10, 2022 at 14:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.