How to pass argument to scoring function in scikit-learn's LogisticRegressionCV call
Asked Answered
F

3

11

Problem

I am trying to use scikit-learn's LogisticRegressionCV with roc_auc_score as the scoring metric.

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

clf = LogisticRegressionCV(scoring=roc_auc_score)

But when I attempt to fit the model (clf.fit(X, y)), it throws an error.

 ValueError: average has to be one of (None, 'micro', 'macro', 'weighted', 'samples')

That's cool. It's clear what's going on: roc_auc_score needs to be called with the average argument specified, per its documentation and the error above. So I tried that.

clf = LogisticRegressionCV(scoring=roc_auc_score(average='weighted'))

But it turns out that roc_auc_score can't be called with an optional argument alone, because this throws another error.

TypeError: roc_auc_score() takes at least 2 arguments (1 given)

Question

Any thoughts on how I can use roc_auc_score as the scoring metric for LogisticRegressionCV in a way that I can specify an argument for the scoring function?

I can't find an SO question on this issue or a discussion of this issue in scikit-learn's GitHub repo, but surely someone has run into this before?

Fewell answered 19/8, 2016 at 17:26 Comment(2)
According to the docs you linked to average has a default value of "macro", so that shouldn't be causing the error.Grisgris
Yeah, not sure why it's asking for a definition of that argument. I thought it might be the version I'm using (0.16.1), but the docs for that version show the same thing.Fewell
F
8

I found a way to solve this problem!

scikit-learn offers a make_scorer function in its metrics module that allows a user to create a scoring object from one of its native scoring functions with arguments specified to non-default values (see here for more information on this function from the scikit-learn docs).

So, I created a scoring object with the average argument specified.

roc_auc_weighted = sk.metrics.make_scorer(sk.metrics.roc_auc_score, average='weighted')

Then, I passed that object in the call to LogisticRegressionCV and it ran without any issues!

clf = LogisticRegressionCV(scoring=roc_auc_weighted)
Fewell answered 19/8, 2016 at 17:52 Comment(3)
You're on the right path, but have to be a bit careful here. The reason is that I think you might be using the .predict method internally. You would need to set make_scorer(..., needs_proba=True). Have a look at my answer.Leavelle
Gyan is your solution with roc_auc_weighted working for you? I don't like idea of creating a hackish def roc_auc_score_proba(y_true, proba), it's having problems when called from Jupiter.Horsepowerhour
Oh. Figured it out. Somehow inclusion of needs_proba=True into roc_auc_weighted made it throwing errors. But your example is working fine "as is".Horsepowerhour
L
10

You can use make_scorer, e.g.

from sklearn.linear_model import LogisticRegressionCV
from sklearn.metrics import roc_auc_score, make_scorer
from sklearn.datasets import make_classification

# some example data
X, y = make_classification()

# little hack to filter out Proba(y==1)
def roc_auc_score_proba(y_true, proba):
    return roc_auc_score(y_true, proba[:, 1])

# define your scorer
auc = make_scorer(roc_auc_score_proba, needs_proba=True)

# define your classifier
clf = LogisticRegressionCV(scoring=auc)

# train
clf.fit(X, y)

# have look at the scores
print clf.scores_
Leavelle answered 19/8, 2016 at 20:16 Comment(2)
Yup, discovered make_scorer, but the problem was with the average argument, the probability extraction piece wasn't actually a part of this problem, but good flag for those who want probabilities rather than binary predictions.Fewell
I understand, yes. I guess I focused on the proba thing because you're using the AUC score (which doesn't make much sense if you don't use probabilities).Leavelle
F
8

I found a way to solve this problem!

scikit-learn offers a make_scorer function in its metrics module that allows a user to create a scoring object from one of its native scoring functions with arguments specified to non-default values (see here for more information on this function from the scikit-learn docs).

So, I created a scoring object with the average argument specified.

roc_auc_weighted = sk.metrics.make_scorer(sk.metrics.roc_auc_score, average='weighted')

Then, I passed that object in the call to LogisticRegressionCV and it ran without any issues!

clf = LogisticRegressionCV(scoring=roc_auc_weighted)
Fewell answered 19/8, 2016 at 17:52 Comment(3)
You're on the right path, but have to be a bit careful here. The reason is that I think you might be using the .predict method internally. You would need to set make_scorer(..., needs_proba=True). Have a look at my answer.Leavelle
Gyan is your solution with roc_auc_weighted working for you? I don't like idea of creating a hackish def roc_auc_score_proba(y_true, proba), it's having problems when called from Jupiter.Horsepowerhour
Oh. Figured it out. Somehow inclusion of needs_proba=True into roc_auc_weighted made it throwing errors. But your example is working fine "as is".Horsepowerhour
R
2

A bit late (4 years later). But today you can use:

clf = LogisticRegressionCV(scoring='roc_auc')

Also, all other scoring keys can be obtained through:

from sklearn.metrics import SCORERS
print(SCORERS.keys())
Rapturous answered 4/6, 2021 at 10:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.