ValueError: continuous format is not supported

Asked 10/6, 2017 at 0:4 Answered 19/12, 2023 at 3:34

Solved python scikit-learn data-science classification valueerror

I have written a simple function where I am using the average_precision_score from scikit-learn to compute average precision.

My Code:

def compute_average_precision(predictions, gold):
    gold_predictions = np.zeros(predictions.size, dtype=np.int)
    for idx in range(gold):
        gold_predictions[idx] = 1
    return average_precision_score(predictions, gold_predictions)

When the function is executed, it produces the following error.

Traceback (most recent call last):
  File "test.py", line 91, in <module>
    total_avg_precision += compute_average_precision(np.asarray(probs), len(gold_candidates))
  File "test.py", line 29, in compute_average_precision
    return average_precision_score(predictions, gold_predictions)
  File "/if5/wua4nw/anaconda3/lib/python3.5/site-packages/sklearn/metrics/ranking.py", line 184, in average_precision_score
    average, sample_weight=sample_weight)
  File "/if5/wua4nw/anaconda3/lib/python3.5/site-packages/sklearn/metrics/base.py", line 81, in _average_binary_score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: continuous format is not supported

If I print the two numpy arrays predictions and gold_predictions, say for one example, it looks alright. [One example is provided below.]

[ 0.40865014  0.26047812  0.07588802  0.26604077  0.10586583  0.17118802
  0.26797949  0.34618672  0.33659923  0.22075308  0.42288553  0.24908153
  0.26506338  0.28224747  0.32942101  0.19986877  0.39831917  0.23635269
  0.34715138  0.39831917  0.23635269  0.35822859  0.12110706]
[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

What I am doing wrong here? What is the meaning of the error?

Mawson answered 10/6, 2017 at 0:4 Comment(1)

What does this predictions represent? Are they outputs of the predict() method of some estimator or do they represent the probability of getting the positive class, or maybe output of predict_proba()? Anyways, y_true or your gold_predictions need to be the first argument and predictions second. – Guzel 10/6, 2017 at 2:8

Just taking a look at the sklearn docs

Parameters:

y_true : array, shape = [n_samples] or [n_samples, n_classes] True binary labels in binary label indicators.

y_score : array, shape = [n_samples] or [n_samples, n_classes] Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).

So your first argument has to be an array of binary labels, but you are passing some sort of float array as the first argument. So I believe you need to reverse the order of the arguments you are passing.

Matti answered 10/6, 2017 at 0:22 Comment(0)

Many of the metrics in scikit-learn work on only specific types of target data. Scikit-learn employs a utility function sklearn.utils.multiclass.type_of_target, to check the type of the target data. The following are the possible types:

continuous, e.g. np.random.rand(100)
continuous-multioutput, e.g. np.random.rand(100,2)
binary, e.g. np.random.choice([0, 1], size=100)
multiclass, e.g. np.random.choice([0, 1, 2], size=100)
multiclass-multioutput, e.g. np.random.choice([0, 1, 2], size=(100,2))
multilabel-indicator, e.g. np.random.choice([0, 1], size=(100,2))
unknown, e.g. np.random.rand(100).astype(object)

The first argument passed to any metric function determines the type of target data. So in the example in the OP, the internal type checking is done on predictions variable, such as the following.

from sklearn.utils.multiclass import type_of_target
type_of_target(predictions)   # 'continuous'

The most common way this error occurs is when one passes an unsupported target type (perhaps ordered args incorrectly as in the OP) when using a metric that assesses performance on a classification task given scores. The following table summarizes all supported types of such metrics.

	continuous- multioutput	binary	multiclass	multiclass- multioutput	multilabel- indicator
average precision score		✔			✔
coverage error					✔
dcg score	✔			✔	✔
det curve		✔
label ranking average precision score					✔
label ranking loss					✔
ndcg score	✔			✔	✔
precision recall curve		✔
roc auc score		✔	✔ `multi_class=` must be passed		✔
roc curve		✔
top-k accuracy score		✔	✔

There are also metrics that assess performance on classification task given class prediction. They generally work on binary, multiclass or multilabel-indicator target types. If a "wrong" target type is fed to it, a related ValueError: Unknown label type or ValueError: y should be a 1d array errors are thrown. The following summarizes which metric admits which target type.

accuracy_score, classification_report, f1_score, fbeta_score, hamming_loss, jaccard_score, log_loss, multilabel_confusion_matrix, precision_recall_fscore_support, precision_score, recall_score, zero_one_loss work on binary, multiclass or multilabel-indicator target types
balanced_accuracy_score, cohen_kappa_score, confusion_matrix, hinge_loss, matthews_corrcoef only work on binary or multiclass
brier_score_loss only works on binary

Yet another adjacent error is ValueError: Classification metrics can't handle a mix of target types. This occurs when the type of predictions don't match the type of true values, i.e. it happens when type_of_target(y_true) != type_of_target(y_pred). Make sure they are the same.

Yet another way this error occurs is if you create a custom scorer using sklearn.metrics.make_scorer with needs_threshold=True. In that case, only binary or multilabel-indicator target types are accepted, even if the underlying metric passed to the scorer works on another target type. For example, sklearn.metrics.top_k_accuracy_score works on multiclass target types but if it is made into a scorer via metrics.make_scorer that needs a threshold, then it wouldn't work anymore.

import numpy as np
from sklearn import linear_model, metrics, datasets

X, y = datasets.make_classification(n_informative=3, n_classes=3)   # multiclass
lr = linear_model.LogisticRegression()
lr.fit(X, y)

metrics.top_k_accuracy_score(y, lr.decision_function(X))   # <--- OK

scorer = metrics.make_scorer(metrics.top_k_accuracy_score, needs_threshold=True)
scorer(lr, X, y)                                           # <--- ValueError: multiclass format is not supported

Unfetter answered 24/5, 2023 at 20:40 Comment(0)

I appreciate all previous two detailed answers. But the solution seems so easy: you just need to exchange the positions of two inputs...

That is, average_precision_score(gold_predictions, predictions)

Indo answered 19/12, 2023 at 3:34 Comment(0)

Recommended topics

Hot tags