Singleton array array(<function train at 0x7f3a311320d0>, dtype=object) cannot be considered a valid collection

Asked 5/4, 2017 at 5:54 Answered 29/12, 2022 at 15:11

Solved python pandas scikit-learn pipeline train-test-split

Not sure how to fix . Any help much appreciate. I saw thi Vectorization: Not a valid collection but not sure if i understood this

train = df1.iloc[:,[4,6]]
target =df1.iloc[:,[0]]

def train(classifier, X, y):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33)
    classifier.fit(X_train, y_train)
    print ("Accuracy: %s" % classifier.score(X_test, y_test))
    return classifier

trial1 = Pipeline([
         ('vectorizer', TfidfVectorizer()),
         ('classifier', MultinomialNB()),])

train(trial1, train, target)

error below :

    ----> 6 train(trial1, train, target)

    <ipython-input-140-ac0e8d32795e> in train(classifier, X, y)
          1 def train(classifier, X, y):
    ----> 2     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33)
          3 
          4     classifier.fit(X_train, y_train)
          5     print ("Accuracy: %s" % classifier.score(X_test, y_test))

    /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
       1687         test_size = 0.25
       1688 
    -> 1689     arrays = indexable(*arrays)
       1690 
       1691     if stratify is not None:

    /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in indexable(*iterables)
        204         else:
        205             result.append(np.array(X))
    --> 206     check_consistent_length(*result)
        207     return result
        208 

    /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
        175     """
        176 
    --> 177     lengths = [_num_samples(X) for X in arrays if X is not None]
        178     uniques = np.unique(lengths)
        179     if len(uniques) > 1:

    /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in <listcomp>(.0)
        175     """
        176 
    --> 177     lengths = [_num_samples(X) for X in arrays if X is not None]
        178     uniques = np.unique(lengths)
        179     if len(uniques) > 1:

    /home/manisha/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py in _num_samples(x)
        124         if len(x.shape) == 0:
        125             raise TypeError("Singleton array %r cannot be considered"
    --> 126                             " a valid collection." % x)
        127         return x.shape[0]
        128     else:

    TypeError: Singleton array array(<function train at 0x7f3a311320d0>, dtype=object) cannot be considered a valid collection.

 ____

Not sure how to fix . Any help much appreciate. I saw thi Vectorization: Not a valid collection but not sure if i understood this

Bespoke answered 5/4, 2017 at 5:54 Comment(1)

In my case, I forgot to state test_size=0.25 (I only passed 0.25). You didn't forget. I hope this helps someone. – Xavier 15/3, 2020 at 23:30

This error arises because your function train masks your variable train, and hence it is passed to itself.

Explanation:

You define a variable train like this:

train = df1.iloc[:,[4,6]]

Then after some lines, you define a method train like this:

def train(classifier, X, y):

So what actually happens is, your previous version of train is updated with new version. That means that the train now does not point to the Dataframe object as you wanted, but points to the function you defined. In the error it is cleared.

array(<function train at 0x7f3a311320d0>, dtype=object)

See the function train inside the error statement.

Solution:

Rename one of them (the variable or the method). Suggestion: Rename the function to some other name like training or training_func or something like that.

Epileptic answered 5/4, 2017 at 7:16 Comment(2)

Probably splitting hairs here and @vivek-kumar's answer is very good, but I would rename the dataframe containing your features to X, x, or x_train. While it's my preference, it's also very common across the ml community to name your train and test sets this way. – Sever 23/8, 2018 at 2:21

This explanation was really helpful. Thank you! – Inexactitude 30/3, 2021 at 4:33

I got the same error in another context (sklearn train_test_split) and the reason was simply that I had passed a positional argument as keyword argument which led to misinterpretation in the called function.

Fanfare answered 8/7, 2018 at 10:22 Comment(3)

It's strange that sklearn doesn't have an Exception to catch this. – Lukelukens 14/2, 2019 at 0:4

Happened to me with the same function used stratify=True instead of stratify=y – Cuspidation 22/3, 2021 at 12:47

Likewise, I passed a string to stratify instead of an object – Dinesh 14/8, 2021 at 15:4

A variation on the first answer - another reason you could get this is if a column name in your data is the same as an attribute/method of the object containing the data.

In my case, I was trying to access the column "count" in the dataframe "df" with the ostensibly legal syntax df.count.

However count is considered an attribute of pandas dataframe objects. The resulting name collision creates the (rather befuddling) error.

Dissimilation answered 6/2, 2019 at 9:16 Comment(0)

I got the same error in sklearn.model_selection train_test_split but in my case the reason was that I was providing an array derived from spark data frame to the function, not an array from a Pandas data frame. When I converted my data from to pandas data frame using toPandas() function such as below, and then providing Pandas df to the train_test_split , the issue was fixed.

pandas_df=spark_df.toPandas()

error:

features_to_use = ['Feature1', 'Feature2']
x5D = np.array(spark_df[ features_to_use ])
y5D = np.array(spark_df['TargetFeature'])
X_train, X_test, y_train, y_test = train_test_split(x5D, y5D, train_size=0.8)

fixed:

pandas_df=spark_df.toPandas()
features_to_use = ['Feature1', 'Feature2']
x5D = np.array(pandas_df[ features_to_use ])
y5D = np.array(pandas_df['TargetFeature'])
X_train, X_test, y_train, y_test = train_test_split(x5D, y5D, train_size=0.8)

Assonance answered 15/4, 2019 at 19:56 Comment(0)

In my case, I just reopened the project by going to File -> open.

Somehow everything is reloaded and starts to work again.

Ary answered 8/9, 2020 at 21:18 Comment(0)

for those of you who are using pyTorch DataLoaders - use the dataset feature:

for idx1, idx2 in kfold.split(train_dl.dataset)

here train_dl is a DataLoader

Amarillis answered 27/10, 2020 at 7:34 Comment(0)

In case you are here but none of the other reasons solved your problem, just make sure you did not do this silly mistake of not sending the target attribute to an SK learn model:

clf.fit(X)

Instead of using

clf.fit(X, y)

Cuspidation answered 29/12, 2022 at 15:11 Comment(0)

Recommended topics

Hot tags