Scikit-learn Ridge classifier: extracting class probabilities

Asked 20/3, 2014 at 15:43 Answered 16/12, 2023 at 21:25

Solved python machine-learning classification scikit-learn

I'm currently using sklearn's Ridge classifier, and am looking to ensemble this classifier with classifiers from sklearn and other libraries. In order to do this, it would be ideal to extract the probability that a given input belongs to each class in a list of classes. Currently, I'm zipping the classes with the output of model.decision_function(x), but this returns the distance from the hyperplane as opposed to a straightforward probability. These distance values vary from around -1 to around 1.

distances = dict(zip(clf.classes_, clf.decision_function(x)[0]))

How can I convert these distances to a more concrete set of probabilities (a series of positive values that sum to 1)? I'm looking for something like clf.predict_proba() that is implemented for the SVC in sklearn.

Serilda answered 20/3, 2014 at 15:43 Comment(2)

There is no predict_proba on RidgeClassifier because it's not easily interpreted as a probability model, AFAIK. A logistic transform or just thresholding at [-1, 1] and mapping that to [0, 1] are possible, but both are hacks. – Santana 21/3, 2014 at 6:16

Yeah, the best I could do was take the softmax of the decision function, but at least that a) maintains relative ordering and b) makes ensembling simpler. – Serilda 21/3, 2014 at 13:49

Further exploration lead to using the softmax function.

d = clf.decision_function(x)[0]
probs = np.exp(d) / np.sum(np.exp(d))

This guarantees a 0-1 bounded distribution that sums to 1.

Serilda answered 23/3, 2014 at 4:20 Comment(0)

A little look at the source code of predict shows that decision_function is in fact the logit-transform of the actual class probabilities, i.e., if decision funciton is f, then the class probability of class 1 is exp(f) / (1 + exp(f)). This translates to following check in the sklearn source:

    scores = self.decision_function(X)
    if len(scores.shape) == 1:
        indices = (scores > 0).astype(np.int)
    else:
        indices = scores.argmax(axis=1)
    return self.classes_[indices]

If you observe this check, it tells you that if decision function is greater than zero, then predict class 1, otherwise predict class 0 - a classical logit approach.

So, you will have to turn the decision function into something like:

d = clf.decision_function(x)[0]
probs = numpy.exp(d) / (1 + numpy.exp(d))

And then take appropriate zip etc.

Noam answered 20/3, 2014 at 16:14 Comment(2)

1) Seems similar to softmax 2) Although the outputs of np.exp(d) / (1 + np.exp(d)) are bounded in the 0 - 1 range, they aren't normalized and don't seem to correspond to the proper distances to the plane. In other words, taking the argmax of the decision function scores doesn't return the same result as taking the argmax of np.exp(d) / (1 + np.exp(d)). Any ideas why? – Serilda 20/3, 2014 at 20:11

Never mind, I think I've answered my own question. I believe the proper solution to the problem is to literally apply softmax: np.exp(d) / np.sum(np.exp(d)). You pointed me in the right direction, though. – Serilda 20/3, 2014 at 20:17

The solutions provided here didn't work for me. I think the softmax function is the correct solution, so I extended RidgeClassifierCV class with a predict_proba method similar to LogisticRegressionCV

from sklearn.utils.extmath import softmax
class RidgeClassifierCVwithProba(RidgeClassifierCV):
    def predict_proba(self, X):
        d = self.decision_function(X)
        d_2d = np.c_[-d, d]
        return softmax(d_2d)

Cohune answered 24/3, 2021 at 13:10 Comment(0)

This version is based on Emanuel's answer. It is agnostic to whether the target is binary or multiclass. It incorporates a temperature hyperparameter that scales the logits before applying the softmax function.

class RidgeClassifierWithProba(RidgeClassifier):
    def __init__(self, temperature=1.0, **kwargs):
        super().__init__(**kwargs)
        self.temperature = temperature

    def predict_proba(self, X):
        d = self.decision_function(X) / self.temperature
        if len(d.shape) == 1:
            d = np.c_[-d, d]
        return softmax(d)

Manuscript answered 16/12, 2023 at 21:25 Comment(0)

Recommended topics

Hot tags