Getting TypeError: Singleton array array(None, dtype=object) cannot be considered a valid collection
Asked Answered
S

1

7

I am using different cross validation method. I first use k fold method on my code and it was perfectly well but when I use repeatedstratifiedkfold method it gives me this error

TypeError: Singleton array array(None, dtype=object) cannot be considered a valid collection.

Can any one help me in this regard. Below is the minimal code that produces the issue.

import numpy as np
from sklearn.model_selection import RepeatedStratifiedKFold


ss = RepeatedStratifiedKFold(n_splits=5, n_repeats=2, random_state=0)

X = np.random.rand(100, 5)
y = np.random.rand(100, 1)

for train_index, test_index in ss.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

Here is the full trackback -

start
Traceback (most recent call last):

  File "C:\Users\full details of final year project\AZU\test_tace_updated.py", line 81, in <module>
    main()

  File "C:\Users\AZU\test_tace_updated.py", line 54, in main
    for train, test in ss.split(X):

  File "C:\Users\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 1201, in split
    for train_index, test_index in cv.split(X, y, groups):

  File "C:\Users\anaconda3\lib\site-packages\sklearn\model_selection\_split.py", line 731, in split
    y = check_array(y, ensure_2d=False, dtype=None)

  File "C:\Users\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)

  File "C:\Users\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 667, in check_array
    n_samples = _num_samples(array)

  File "C:\Users\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 202, in _num_samples
    raise TypeError("Singleton array %r cannot be considered"

TypeError: Singleton array array(None, dtype=object) cannot be considered a valid collection.
Simonsimona answered 7/1, 2021 at 3:51 Comment(4)
Hi, welcome to stackoverflow. Please make sure that your error is reproducible before posting it here. You can check guidelines for posting a question here. stackoverflow.com/help/how-to-ask. It's not possible to reproduce your issue here without having the F-E.mat file that you have. You can instead replace that with a random numpy array of the same shape that gives the same error.Garrido
If it can be helped, please do not link to external data like you have done. And in this case, it can be helped. Just replace your original file with a random numpy array. You need to make it as easy as possible on people trying to help you. If they have to download an external dataset, a lot of people are simply going to move on to the next question. I have submitted an edit to your question after making it Minimal and reproducible and removed the link to that data.Garrido
Ohhh i was not familiar with that and I remove the edit. What I can do now?Simonsimona
I have submitted an edit, please do not make further edits to the post until that has been approved by someone. You don't have to do anything for now.Garrido
G
4

I am inclined to say that this is a bug in sklean (definitely not sure about that) but if you also include y in your split function, the issue seems to go away. The following code runs as expected.

import numpy as np
from sklearn.model_selection import RepeatedStratifiedKFold

ss = RepeatedStratifiedKFold(n_splits=5, n_repeats=2, random_state=0)

X = np.random.rand(100, 5)
y = np.zeros(100)

for train_index, test_index in ss.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

It's weird because according to the documentation, y should be optional.

Garrido answered 7/1, 2021 at 4:57 Comment(3)
Yes I did in the same way but then my accuracy reduced to 0.04. I believe there might be some other way around and I might doing it wrongSimonsimona
Your accuracy reducing should have nothing to do with whether or not you passed on an additional variable to your splitting function. Because you can of course, just choose not to use the corresponding y that you created. If your accuracy is low, it's some other issue.Garrido
Great catch on the bug Ananda, was struggling with thisLubberly

© 2022 - 2024 — McMap. All rights reserved.