ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0.0
Asked Answered
C

6

18

I have applied Logistic Regression on train set after splitting the data set into test and train sets, but I got the above error. I tried to work it out, and when i tried to print my response vector y_train in the console it prints integer values like 0 or 1. But when i wrote it into a file I found the values were float numbers like 0.0 and 1.0. If thats the problem, how can I over come it.

lenreg = LogisticRegression()

print y_train[0:10]
y_train.to_csv(path='ytard.csv')

lenreg.fit(X_train, y_train)
y_pred = lenreg.predict(X_test)
print metics.accuracy_score(y_test, y_pred)

StrackTrace is as follows,

Traceback (most recent call last):

  File "/home/amey/prog/pd.py", line 82, in <module>

    lenreg.fit(X_train, y_train)

  File "/usr/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1154, in fit

    self.max_iter, self.tol, self.random_state)

  File "/usr/lib/python2.7/dist-packages/sklearn/svm/base.py", line 885, in _fit_liblinear

    " class: %r" % classes_[0])

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0.0

Meanwhile I've gone across the link which was unanswered. Is there a solution.

Contour answered 10/11, 2016 at 10:6 Comment(1)
Some remarks: (1) LogisticRegression is classification, not really regression. So you need classes (2) Y should consist of classes. Either a 1d-boolean-array for each sample marking the class with a 1, or one number for each sample with the class (e.g. 5 classes -> of number of (0,1,2,3,4). (3) Y needs to be of integral-type -> no floats! (4) Check your y_train!European
S
23

The problem here is that your y_train vector, for whatever reason, only has zeros. It is actually not your fault, and its kind of a bug ( I think ). The classifier needs 2 classes or else it throws this error.

It makes sense. If your y_train vector only has zeros, ( ie only 1 class ), then the classifier doesn't really need to do any work, since all predictions should just be the one class.

In my opinion the classifier should still complete and just predict the one class ( all zeros in this case ) and then throw a warning, but it doesn't. It throws the error in stead.

A way to check for this condition is like this:

lenreg = LogisticRegression()

print y_train[0:10]
y_train.to_csv(path='ytard.csv')

if len(np.sum(y_train)) in [len(y_train),0]:
    print "all one class"
    #do something else
else:
    #OK to proceed
    lenreg.fit(X_train, y_train)
    y_pred = lenreg.predict(X_test)
    print metics.accuracy_score(y_test, y_pred)

TO overcome the problem more easily i would recommend just including more samples in you test set, like 100 or 1000 instead of 10.

Salo answered 9/5, 2017 at 12:57 Comment(0)
I
5

I had the same problem using learning_curve:

 train_sizes, train_scores, test_scores = learning_curve(estimator,
           X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes,
           scoring="f1", random_state=RANDOM_SEED, shuffle=True)

add the suffle parameter that will randomize the sets.

This doesn't prevent error from happening but it's a way to increase the chances to have both classes in subsets used by the function.

Inshrine answered 2/9, 2019 at 23:6 Comment(0)
V
3

I found it to be because of only 1's or 0's wound up in my y_test since my sample size was really small. Try chaning your test_size value.

Valkyrie answered 2/7, 2019 at 16:22 Comment(0)
R
2
# python3
import numpy as np
from sklearn.svm import LinearSVC

def upgrade_to_work_with_single_class(SklearnPredictor):
    class UpgradedPredictor(SklearnPredictor):
        def __init__(self, *args, **kwargs):
            self._single_class_label = None
            super().__init__(*args, **kwargs)

        @staticmethod
        def _has_only_one_class(y):
            return len(np.unique(y)) == 1

        def _fitted_on_single_class(self):
            return self._single_class_label is not None

        def fit(self, X, y=None):
            if self._has_only_one_class(y):
                self._single_class_label = y[0]
            else:
                super().fit(X, y)
            return self

        def predict(self, X):
            if self._fitted_on_single_class():
                return np.full(X.shape[0], self._single_class_label)
            else:
                return super().predict(X)
    return UpgradedPredictor

LinearSVC = upgrade_to_work_with_single_class(LinearSVC)

or hard-way (more right):

import numpy as np

from sklearn.svm import LinearSVC
from copy import deepcopy, copy
from functools import wraps

def copy_class(cls):
    copy_cls = type(f'{cls.__name__}', cls.__bases__, dict(cls.__dict__))
    for name, attr in cls.__dict__.items():
        try:
            hash(attr)
        except TypeError:
            # Assume lack of __hash__ implies mutability. This is NOT
            # a bullet proof assumption but good in many cases.
            setattr(copy_cls, name, deepcopy(attr))
    return copy_cls

def upgrade_to_work_with_single_class(SklearnPredictor):
    SklearnPredictor = copy_class(SklearnPredictor)
    original_init = deepcopy(SklearnPredictor.__init__)
    original_fit = deepcopy(SklearnPredictor.fit)
    original_predict = deepcopy(SklearnPredictor.predict)

    @staticmethod
    def _has_only_one_class(y):
        return len(np.unique(y)) == 1

    def _fitted_on_single_class(self):
        return self._single_class_label is not None

    @wraps(SklearnPredictor.__init__)
    def new_init(self, *args, **kwargs):
        self._single_class_label = None
        original_init(self, *args, **kwargs)

    @wraps(SklearnPredictor.fit)
    def new_fit(self, X, y=None):
        if self._has_only_one_class(y):
            self._single_class_label = y[0]
        else:
            original_fit(self, X, y)
        return self

    @wraps(SklearnPredictor.predict)
    def new_predict(self, X):
        if self._fitted_on_single_class():
            return np.full(X.shape[0], self._single_class_label)
        else:
            return original_predict(self, X)

    setattr(SklearnPredictor, '_has_only_one_class', _has_only_one_class)
    setattr(SklearnPredictor, '_fitted_on_single_class', _fitted_on_single_class)
    SklearnPredictor.__init__ = new_init
    SklearnPredictor.fit = new_fit
    SklearnPredictor.predict = new_predict
    return SklearnPredictor

LinearSVC = upgrade_to_work_with_single_class(LinearSVC)
Rum answered 1/10, 2019 at 9:32 Comment(0)
M
0

You can find the indexes of the first (or any) occurrence of each of the classes and concatenate them on top of the arrays and delete them from their original positions, that way there will be at least one instance of each class in the training set.

Montsaintmichel answered 17/3, 2021 at 17:49 Comment(0)
D
-2

This error related to the dataset you are using, the dataset contains a class for example 1/benign, whereas it must contain two classes 1 and 0 or Benign and Attack.

Dextrality answered 11/4, 2021 at 23:2 Comment(1)
when the number of samples is one or the samples just contain one class, this problem will appear.Riyal

© 2022 - 2024 — McMap. All rights reserved.