Multi-class, multi-label, ordinal classification with sklearn
Asked Answered
T

3

7

I was wondering how to run a multi-class, multi-label, ordinal classification with sklearn. I want to predict a ranking of target groups, ranging from the one that is most prevalant at a certain location (1) to the one that is least prevalent (7). I don't seem to be able to get it right. Could you please help me out?


# Random Forest Classification

# Import
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV, cross_val_score, train_test_split
from sklearn.metrics import make_scorer, accuracy_score, confusion_matrix, f1_score
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

# Import dataset
dataset = pd.read_excel('alle_probs_edit.v2.xlsx')
X = dataset.iloc[:,4:-1].values
Y = dataset.iloc[:,-1].values

# Split in Train and Test
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 42 )

# Scaling the features (alle Variablen auf eine gleiche Ebene), necessary depend on the choosen method
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

# Creat classifier
classifier =  RandomForestClassifier(criterion = 'entropy')

# Choose some parameter combinations to try
parameters = {'bootstrap': [True, False],
 'max_depth': [50],
 'max_features': ['auto', 'sqrt'],
 'min_samples_leaf': [1, 2, 3, 4],
 'min_samples_split': [9, 10, 11, 12, 13],
 'n_estimators': [500,1000,1500]}

# Type of scoring used to compare parameter combinations
acc_scorer = make_scorer(accuracy_score)

# Run the grid search
grid_obj = GridSearchCV(classifier, parameters, scoring=acc_scorer, cv = 3, n_jobs = -1)
grid_obj = grid_obj.fit(X_train, Y_train)

# Set the classifier to the best combination of parameters
classifier = grid_obj.best_estimator_

# Fit the best algorithm to the data
classifier.fit(X_train, Y_train)

#Prediction the Test data
Y_pred = classifier.predict(X_test)

#Confusion Matrix
cm = pd.DataFrame(confusion_matrix(Y_test, Y_pred))

#Accuracy
accuracy1 = accuracy_score(Y_test, Y_pred)
print("Accuracy1: %.2f%%" % (accuracy1 * 100.0))

# k-Fold Class Validation
accuracy1 = cross_val_score(estimator = classifier, X = X_train, y = Y_train, cv = 10)
kfold = accuracy1.mean()
accuracy1.std()
Trench answered 19/8, 2019 at 17:9 Comment(0)
U
7

This may not be the precise answer you're looking for, this article outlines a technique as follows:

We can take advantage of the ordered class value by transforming a k-class ordinal regression problem to a k-1 binary classification problem, we convert an ordinal attribute A* with ordinal value V1, V2, V3, … Vk into k-1 binary attributes, one for each of the original attribute’s first k − 1 values. The ith binary attribute represents the test A* > Vi

Essentially, aggregate multiple binary classifiers (predict target > 1, target > 2, target > 3, target > 4) to be able to predict whether a target is 1, 2, 3, 4 or 5. The author creates an OrdinalClassifier class that stores multiple binary classifiers in a Python dictionary.

import numpy as np
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.metrics import accuracy_score

class OrdinalClassifier(BaseEstimator, ClassifierMixin):

    def __init__(self, clf):
        self.clf = clf
        self.clfs = {}
        self.unique_class = np.NaN

    def fit(self, X, y):
        self.unique_class = np.sort(np.unique(y))
        if self.unique_class.shape[0] > 2:
            for i in range(self.unique_class.shape[0]-1):
                # for each k - 1 ordinal value we fit a binary classification problem
                binary_y = (y > self.unique_class[i]).astype(np.uint8)
                clf = clone(self.clf)
                clf.fit(X, binary_y)
                self.clfs[i] = clf

    def predict_proba(self, X):
        clfs_predict = {i: self.clfs[i].predict_proba(X) for i in self.clfs}
        predicted = []
        k = len(self.unique_class) - 1
        for i, y in enumerate(self.unique_class):
            if i == 0:
                # V1 = 1 - Pr(y > V1)
                predicted.append(1 - clfs_predict[0][:,1])
            elif i < k:
                # Vi = Pr(y <= Vi) * Pr(y > Vi-1)
                 predicted.append((1 - clfs_predict[i][:,1]) * clfs_predict[i-1][:,1])
            else:
                # Vk = Pr(y > Vk-1)
                predicted.append(clfs_predict[k-1][:,1])
        return np.vstack(predicted).T

    def predict(self, X):
        return self.unique_class[np.argmax(self.predict_proba(X), axis=1)]

    def score(self, X, y, sample_weight=None):
        return accuracy_score(y, self.predict(X), sample_weight=sample_weight)

The technique originates in A Simple Approach to Ordinal Classification

Undistinguished answered 13/6, 2020 at 16:35 Comment(6)
You might want to add some inheritance for OrdinalClassifier. ``` from sklearn.base import clone, BaseEstimator, ClassifierMixin class OrdinalClassifier(BaseEstimator, ClassifierMixin): ... ``` Then, if you want to use something like GridSearchCV, you can create a subclass for a specific algorithm: ``` class KNeighborsOrdinalClassifier(OrdinalClassifier): def __init__(self, n_neighbors=5, ...): self.n_neighbors = n_neighbors ... self.clf = KNeighborsClassifier(neighbors=self.n_neighbors, ...) self.clfs = {} ```Guacin
@David Diaz I am currently working with the OrdinalClassifier from Kartik Chugh and was indeed looking for a way to use GridSearch or RandomSearch. I think I get what you propose, but I'm really not sure how to implement this. Could you maybe give a code example? Thanks in advance!Tica
There's some good reference material here I've used before: danielhnyk.cz/creating-your-own-estimator-scikit-learn Hope that helps!Undistinguished
@t.pellegrom, I've posted an example with KNN now. Hopefully close enough so you can kick it and get it running!Guacin
Inspired by this and aggreeing with @DavidDiaz that a subclass is needed that will support GS , pipeline, etc. I took a stab at this with a generic wrapper for any classifier. I have not tested extensively yet but I did stub in some code that addresses some problems raised here: towardsdatascience.com/… about how some prob don't sum to 1. I used sklearn OvR as foundation starting point. github.com/leeprevost/OrdinalClassifier/blob/main/ordinal.py. Have not tested yet. Will post progress...Melbamelborn
To all commenters feel free to make whatever edits you deem beneficial, as I am not an expert in some of these detailsUndistinguished
G
3

Here is an example using KNN that should be tuneable in an sklearn pipeline or grid search.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.base import clone, BaseEstimator, ClassifierMixin
from sklearn.utils.validation import check_X_y, check_is_fitted, check_array
from sklearn.utils.multiclass import check_classification_targets

class KNeighborsOrdinalClassifier(BaseEstimator, ClassifierMixin):
    def __init__(self, n_neighbors=5, *, weights='uniform', 
                 algorithm='auto', leaf_size=30, p=2, 
                 metric='minkowski', metric_params=None, n_jobs=None):
        
        self.n_neighbors = n_neighbors
        self.weights = weights
        self.algorithm = algorithm
        self.leaf_size = leaf_size
        self.p = p
        self.metric = metric
        self.metric_params = metric_params
        self.n_jobs = n_jobs
        
    def fit(self, X, y):
        X, y = check_X_y(X, y)
        check_classification_targets(y)
        
        self.clf_ = KNeighborsClassifier(**self.get_params())
        self.clfs_ = {}
        self.classes_ = np.sort(np.unique(y))
        if self.classes_.shape[0] > 2:
            for i in range(self.classes_.shape[0]-1):
                # for each k - 1 ordinal value we fit a binary classification problem
                binary_y = (y > self.classes_[i]).astype(np.uint8)
                clf = clone(self.clf_)
                clf.fit(X, binary_y)
                self.clfs_[i] = clf
        return self
    
    def predict_proba(self, X):
        X = check_array(X)
        check_is_fitted(self, ['classes_', 'clf_', 'clfs_'])
        
        clfs_predict = {k:self.clfs_[k].predict_proba(X) for k in self.clfs_}
        predicted = []
        for i,y in enumerate(self.classes_):
            if i == 0:
                # V1 = 1 - Pr(y > V1)
                predicted.append(1 - clfs_predict[y][:,1])
            elif y in clfs_predict:
                # Vi = Pr(y > Vi-1) - Pr(y > Vi)
                 predicted.append(clfs_predict[y-1][:,1] - clfs_predict[y][:,1])
            else:
                # Vk = Pr(y > Vk-1)
                predicted.append(clfs_predict[y-1][:,1])
        return np.vstack(predicted).T
    
    def predict(self, X):
        X = check_array(X)
        check_is_fitted(self, ['classes_', 'clf_', 'clfs_'])
        
        return np.argmax(self.predict_proba(X), axis=1)
Guacin answered 28/1, 2021 at 21:23 Comment(5)
Thank you very much. This helps a lot! I have one more question. This methods appears to have a bias towards the middle classifications (if I have 10 classes it classifies 80% of observations as classes 5 and 6 even though all classes appear exactly the same number of times in my data. What can cause this and how can I try to mitigate it?Tica
1. Visualize your data to see if you can see any ways to separate them given your input data. 2. You may also need to explore feature transformations (PCA, LDA, etc.) to better separate your classes. 3. If you have domain expertise to inform a preference for precision vs. recall, you could explore weighting of certain classes or customize a more informative scoring metric than f1, precision, recall, etc. 4. I've also found it helpful to make a "dummy" classifier for benchmarking.Guacin
Did you tested it? I have to change the function to ``` def predict_proba(self, X): ... for i,y in enumerate(self.classes_): if i == 0: # V1 = 1 - Pr(y > V1) predicted.append(1 - clfs_predict[i][:,1]) elif y in clfs_predict: # Vi = Pr(y > Vi-1) - Pr(y > Vi) predicted.append(clfs_predict[i-1][:,1] - clfs_predict[i][:,1]) else: # Vk = Pr(y > Vk-1) predicted.append(clfs_predict[i-1][:,1]) return np.vstack(predicted).T ```Wireman
@FernandoFelix, I have not tested it. I think you are right if your target variable does not start at zero and increment by 1. It does seem like the edits you proposed would allow the target variable to have different levels, and is a more generalized version of what I wrote.Guacin
@FernandoFelix I think the revision you're suggesting could also be achieved by storing the classifiers in self.clfs_ using the class labels as the key to the dictionary instead of the place in the ordinal levels (i.e., using self.classes_[i] as the key instead of i).Guacin
M
1

Building off both David Diaz, the white paper, and Kartik above along with others linked to on Medium and attributed in the readme, I'm working on an OrdinalClassifier that is built on the sklearn framework and which works well with sklearn pipelines, scoring, and cross validation.

The OC performs very well vs. standard non ordinal mc classification and gives greater control over optimizing for precision/recall on the positive class (ie. "high" in for example the diabetes disease progression of low<medium<high classes. It supports any sklearn classifier that supports pred_proba. Cross validation scores are shown on repo.

OrdinalClassifer based on sklearn

https://github.com/leeprevost/OrdinalClassifier

At this time, I would not call it multi-label.

Melbamelborn answered 17/5, 2022 at 17:40 Comment(2)
In the original paper A Simple Approach to Ordinal Classification, on which Muhammad's blogpost is based, the classifiers' probabilities are multiplied, not subtracted. In other words, the probability of class V[i] given sample A is calculated as Pr(A<=V[i])*Pr(A>V[i-1]) rather than as Pr(A<=V[i])-Pr(A>V[i-1]). You might want to adjust your code to correspond to the original paper, or at least give an option to the user of your class to employ the algorithm of the original paper.Kipp
that sounds like a mistake on my part. I have abandoned this as I found the problem I was trying to solve was solved better using linear regression than classification. But, I agree, would be nice to revisit and update when time permits.Melbamelborn

© 2022 - 2024 — McMap. All rights reserved.