GridSearchCV on LogisticRegression in scikit-learn
Asked Answered
H

3

11

I am trying to optimize a logistic regression function in scikit-learn by using a cross-validated grid parameter search, but I can't seem to implement it.

It says that Logistic Regression does not implement a get_params() but on the documentation it says it does. How can I go about optimizing this function on my ground truth?

>>> param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000] }
>>> clf = GridSearchCV(LogisticRegression(penalty='l2'), param_grid)
>>> clf
GridSearchCV(cv=None,
       estimator=LogisticRegression(C=1.0, intercept_scaling=1, dual=False, fit_intercept=True,
          penalty='l2', tol=0.0001),
       fit_params={}, iid=True, loss_func=None, n_jobs=1,
       param_grid={'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000]},
       pre_dispatch='2*n_jobs', refit=True, score_func=None, verbose=0)
>>> clf = clf.fit(gt_features, labels)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-x86_64.egg/sklearn/grid_search.py", line 351, in fit
    base_clf = clone(self.estimator)
  File "/Library/Python/2.7/site-packages/scikit_learn-0.14_git-py2.7-macosx-10.8-x86_64.egg/sklearn/base.py", line 42, in clone
    % (repr(estimator), type(estimator)))
TypeError: Cannot clone object 'LogisticRegression(C=1.0, intercept_scaling=1, dual=False, fit_intercept=True,
          penalty='l2', tol=0.0001)' (type <class 'scikits.learn.linear_model.logistic.LogisticRegression'>): it does not seem to be a scikit-learn estimator a it does not implement a 'get_params' methods.
>>> 
Houdon answered 26/9, 2013 at 2:26 Comment(0)
S
8

The class name scikits.learn.linear_model.logistic.LogisticRegression refers to a very old version of scikit-learn. The top level package name is now sklearn since at least 2 or 3 releases. It's very likely that you have old versions of scikit-learn installed concurrently in your python path. Uninstall them all, then reinstall 0.14 or later and try again.

Sortition answered 26/9, 2013 at 9:29 Comment(2)
Thanks very much. All I had to do was switch the import statement and it worked. Simple answer. Can you (or anyone else) also tell me which parameters can be optimized in Logistic Regression grid search? Is it just C? Here is the full object: clf.best_estimator_ LogisticRegression(C=1, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)Houdon
Yes, C is the most important one.Eclogue
H
8

You can also give penalty as a parameter along with C. E.g. :

grid_values = {'penalty': ['l1','l2'], 'C': [0.001,0.01,0.1,1,10,100,1000]}. and then, model_lr = GridSearchCV(lr, param_grid=grid_values)

Highwrought answered 24/8, 2017 at 7:57 Comment(0)
H
4
from sklearn.model_selection import GridSearchCV

Depending of the power of your computer you could go for:

parameters = [{'penalty':['l1','l2']}, 
              {'C':[1, 10, 100, 1000]}]
grid_search = GridSearchCV(estimator = logreg,  
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 5,
                           verbose=0)


grid_search.fit(X_train, y_train)   

or that deep one.

parameters = [{'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']},
              {'penalty':['none', 'elasticnet', 'l1', 'l2']},
              {'C':[0.001, 0.01, 0.1, 1, 10, 100]}]



grid_search = GridSearchCV(estimator = logreg,  
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 5,
                           verbose=0)


grid_search.fit(X_train, y_train)
Homothallic answered 25/4, 2022 at 18:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.