AxisError: axis 1 is out of bounds for array of dimension 1 when calculating AUC
Asked Answered
P

3

12

I have a classification problem where I have the pixels values of an 8x8 image and the number the image represents and my task is to predict the number('Number' attribute) based on the pixel values using RandomForestClassifier. The values of the number values can be 0-9.

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score

forest_model = RandomForestClassifier(n_estimators=100, random_state=42)
forest_model.fit(train_df[input_var], train_df[target])
test_df['forest_pred'] = forest_model.predict_proba(test_df[input_var])[:,1]
roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovr")

Here it throws an AxisError.

Traceback (most recent call last):
  File "dap_hazi_4.py", line 44, in 
    roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovo")
  File "/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py", line 383, in roc_auc_score
    multi_class, average, sample_weight)
  File "/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py", line 440, in _multiclass_roc_auc_score
    if not np.allclose(1, y_score.sum(axis=1)):
  File "/home/balint/.local/lib/python3.6/site-packages/numpy/core/_methods.py", line 38, in _sum
    return umr_sum(a, axis, dtype, out, keepdims, initial, where)

AxisError: axis 1 is out of bounds for array of dimension 1
Pedalfer answered 18/4, 2020 at 12:20 Comment(4)
I managed to solve my problem. It was that, because my classification problem was multiclass the target column needed to be binarized before fitting and calculating the auc score.Maurer
What exactly did you do @Bálint Béres?Elodia
I have used this Calculate sklearn.roc_auc_score for multi-class @mclzc.Maurer
When using sklearn.model_selection.cross_validate and similar and this error appears you just need to set needs_proba=True in make_scorer(roc_auc_score, multi_class='ovo', needs_proba=True)Viipuri
B
14

The error is due to multi-class problem that you are solving as others suggested. All you need to do is instead of predicting the class, you need to predict the probabilities. I had this same problem before, doing this solves it.

Here is how to do it -

# you might be predicting the class this way
pred = clf.predict(X_valid)

# change it to predict the probabilities which solves the AxisError problem.
pred_prob = clf.predict_proba(X_valid)
roc_auc_score(y_valid, pred_prob, multi_class='ovr')
0.8164900342274142

# shape before
pred.shape
(256,)
pred[:5]
array([1, 2, 1, 1, 2])

# shape after
pred_prob.shape
(256, 3)
pred_prob[:5]
array([[0.  , 1.  , 0.  ],
       [0.02, 0.12, 0.86],
       [0.  , 0.97, 0.03],
       [0.  , 0.8 , 0.2 ],
       [0.  , 0.42, 0.58]])

Boron answered 15/11, 2021 at 5:25 Comment(0)
M
5

Actually, as your problem is multi-class the labels must be one-hot encoded. When labels are one-hot encoded then the 'multi_class' arguments work. By providing one-hot encoded labels you can resolve the error.

Suppose, you have 100 test labels with 5 unique classes then your matrix size(test label's) must be (100,5) NOT (100,1)

Moxley answered 19/10, 2020 at 22:1 Comment(3)
I am having the same problem over here. How do I transform my pred from from (45520,) to (45520,5)?Lukas
If you're using tensorflow or keras you can do it by using the function tf.keras.utils.to_categorical(.) or just keras.utils.to_categorical(.)Moxley
If someone is using Sklearn, should use LabelBinarizer to convert the labels into one-hot-encode format. scikit-learn.org/stable/modules/generated/…Dhoti
M
1

You sure this [:,1] in test_df['forest_pred'] = forest_model.predict_proba(test_df[input_var])[:,1] is right? It's probably 1D array

Meleager answered 18/4, 2020 at 16:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.