How to set weights in multi-class classification in xgboost for imbalanced data?
Asked Answered
P

2

5

I know that you can set scale_pos_weight for an imbalanced dataset. However, How to deal with the multi-classification problem in the imbalanced dataset. I have gone through https://datascience.stackexchange.com/questions/16342/unbalanced-multiclass-data-with-xgboost/18823 but don't quite understand how to set weight parameter in Dmatrix.

Can anyone please explain in detail?

Phytogeography answered 22/8, 2017 at 7:15 Comment(2)
Look here: #35984065 and here #42618991Fractostratus
You can use: class_weights = dict(enumerate(len(y_train) / (len(np.unique(y_train)) * np.bincount(y_train)))) to calculate class weighs for imbalanced classification. Also see this worked example.Holp
I
5

For imbalanced dataset, I used the "weights" parameter in Xgboost where weights is an array of weight assigned according to the class the data belongs to.

def CreateBalancedSampleWeights(y_train, largest_class_weight_coef):
    classes = np.unique(y_train, axis = 0)
    classes.sort()
    class_samples = np.bincount(y_train)
    total_samples = class_samples.sum()
    n_classes = len(class_samples)
    weights = total_samples / (n_classes * class_samples * 1.0)
    class_weight_dict = {key : value for (key, value) in zip(classes, weights)}
    class_weight_dict[classes[1]] = class_weight_dict[classes[1]] * 
    largest_class_weight_coef
    sample_weights = [class_weight_dict[y] for y in y_train]
    return sample_weights

Just pass the target column and the occurance rate of most frequent class (if most frequent class has 75 out of 100 samples, then its 0.75)

    largest_class_weight_coef = 
    max(df_copy['Category'].value_counts().values)/df.shape[0]
    
    #pass y_train as numpy array
    weight = CreateBalancedSampleWeights(y_train, largest_class_weight_coef)

    #And then use it like this
    xg = XGBClassifier(n_estimators=1000, weights = weight, max_depth=20)

Thats it :)

Idalla answered 5/12, 2019 at 12:5 Comment(0)
H
1

XGBClassifier request don't support weights parameter

Please, use sample_weight in fit() request: fit(X, y, sample_weight)

For detail information please xgboost python api

For use weight in direct request use another algo: random forest or light gbm

Haematocryal answered 16/3, 2023 at 6:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.