ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters

Asked 3/2, 2020 at 16:25 Answered 14/9, 2024 at 10:34

python scikit-learn nlp

I wrote a text classification program. When I run the program it crashes with an error as seen in this screenshot:

ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

Here is my code:

from sklearn.model_selection import train_test_split
from gensim.models.word2vec import Word2Vec
from sklearn.preprocessing import scale
from sklearn.linear_model import SGDClassifier
import nltk, string, json
import numpy as np

def cleanText(corpus):
    reviews = []
    for dd in corpus:
        #for d in dd:
        try:
            words = nltk.word_tokenize(dd['description'])
            words = [w.lower() for w in words]
            reviews.append(words)
            #break
        except:
            pass
    return reviews

with open('C:\\NLP\\bad.json') as fin:
    text = json.load(fin)
    neg_rev = cleanText(text)

with open('C:\\NLP\\good.json') as fin:
    text = json.load(fin)
    pos_rev = cleanText(text)

#1 for positive sentiment, 0 for negative
y = np.concatenate((np.ones(len(pos_rev)), np.zeros(len(neg_rev))))

x_train, x_test, y_train, y_test = train_test_split(np.concatenate((pos_rev, neg_rev)), y, test_size=0.2)

The data I am using is available here:

Bad;
Good

How would I go about fixing this error?

Passably answered 3/2, 2020 at 16:25 Comment(6)

Have you checked the shape of your concatenated reviews and your y variable? – Diaeresis 3/2, 2020 at 16:28

hmm....., print (y) = [ ] – Passably 3/2, 2020 at 19:38

That explains your n_samples=0 in the error. So work backward from there and figure out what actually comes out of your parsing in pos_rev and neg_rev, because if you get no errors, it seems likely that the len() of each is 0 – Diaeresis 3/2, 2020 at 20:53

df.shape gives output of Out[48]: (1, 67501) – Patois 24/8, 2023 at 7:10

Still the below line of code results in ValueError: x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.20,random_state=77,stratify=y) – Patois 24/8, 2023 at 7:10

OMG! Its python indentation going wrong. While refactoring and introducing a function this happened. Check this kind of error too, others. – Patois 24/8, 2023 at 9:13

Got the same error: ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters. In my case the path to the data was invalid. Check if paths to the loaded files exists or if the variables reading the files hold any data.

Unbidden answered 11/8, 2020 at 11:30 Comment(0)

I think below code snippets can help you in solving the issue.

ad_data=pd.read_csv("advertising.csv")from sklearn.model_selection import train_test_split
x=ad_data[['Daily Time Spent on Site', 'Age', 'Area Income','Daily Internet Usage', 'Male']]
y=ad_data['Clicked on Ad']
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)

Xerxes answered 25/2, 2022 at 4:26 Comment(0)

Got the same error: ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters. The error occurs because of that the Data file is not found. In case you are using jupyter notebook you cannot used the Dataframe for defining X and y. So, using the dataframe to define x and y the problem is solved. In other case, it occurs due to it can't find the data. So check in the given path the data is exist.

Gametangium answered 14/5, 2021 at 12:8 Comment(0)

I was facing the same problem. In my case, the problem was with the wrong file extension. I resolved this error just by changing the file extension. Thus, for this error: ValueError: With n_samples=0, test_size=0.2 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters, file extension can also be checked.

Paquette answered 10/6, 2021 at 8:38 Comment(0)

In my case the path was correct but I was using bert for text classification that is when I noticed the shape of the X and y variable I was having only one sentence in input data so the model_selection was unable to split it so when I reshaped it into multiple sentences and it worked

Carat answered 17/5, 2022 at 9:1 Comment(0)

I was facing the same problem. In my case, restarting my environment like vsc was useful.

Lordsandladies answered 17/7, 2024 at 13:38 Comment(0)

I went crazy. I couldn't solve this problem. I applied everything you said. It still didn't work. It turns out that I did some operations on the right side of the Excel table that were not visible but I didn't delete them. It worked when I deleted them. Good luck

Initiative answered 14/9, 2024 at 10:34 Comment(1)

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center. – Ledbetter 17/9, 2024 at 7:59

-1

#x_train, x_test, y_train, y_test = train_test_split(X, Y, random_state=0)
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=42)
slr = LinearRegression()
slr.fit(x_train.reshape(-1,1), y_train)

I keep getting the error with n_samples=0, test_size=0.33 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters

Cull answered 27/3, 2023 at 17:46 Comment(0)

Recommended topics

Hot tags