train-test-split Questions

2

Solved

I have the following data: Group_ID Item_id Target 0 1 1 0 1 1 2 0 2 1 3 1 3 2 4 0 4 2 5 1 5 2 6 1 6 3 7 0 7 4 8 0 8 5 9 0 9 5 10 1 I need to split the dataset into a training and testing set bas...

7

Solved

Not sure how to fix . Any help much appreciate. I saw thi Vectorization: Not a valid collection but not sure if i understood this train = df1.iloc[:,[4,6]] target =df1.iloc[:,[0]] def train(class...
Bespoke asked 5/4, 2017 at 5:54

11

I'm trying to split my dataset into a training and a test set by using the train_test_split function from scikit-learn, but I'm getting this error: In [1]: y.iloc[:,0].value_counts() Out[1]: M2 3...
Navaho asked 3/4, 2017 at 8:0

8

Solved

So I have a main folder which contains sub-folders which in turn contains images for the dataset as follows. -main_db ---CLASS_1 -----img_1 -----img_2 -----img_3 -----img_4 ---CLASS_2 -----...
Aerogram asked 7/8, 2019 at 12:5

13

I have a single directory which contains sub-folders (according to labels) of images. I want to split this data into train and test set while using ImageDataGenerator in Keras. Although model.fit()...
Impudicity asked 24/2, 2017 at 16:43

4

Solved

I want to separate my data into train and test set, should I apply normalization over data before or after the split? Does it make any difference while building predictive model?
Nonmaterial asked 23/3, 2018 at 7:13

2

Solved

is there any way to set seed on train_test_split on python sklearn. I have set the parameter random_state to an integer, but I still can not reproduce the result. Thanks in advance.
Summerlin asked 16/5, 2019 at 10:12

3

Solved

I am using datatable dataframe. How can I split the dataframe into train and test dataset? Similarly to pandas dataframe, I tried to use train_test_split(dt_df,classes) from sklearn.model_selection...
Steepen asked 21/7, 2020 at 19:48

3

I am following the IRIS example of tensorflow. My case now is I have all data in a single CSV file, not separated, and I want to apply k-fold cross validation on that data. I have data_set = tf...
Pharyngeal asked 28/9, 2016 at 13:15

2

On a jupyter notebook with Tensorflow-2.0.0, a train-validation-test split of 80-10-10 was performed in this way: import tensorflow_datasets as tfds from os import getcwd splits = tfds.Split.ALL.su...
Thole asked 20/10, 2020 at 18:44

3

Solved

I am trying to split the dataset into train and test subsets in Julia. So far, I have tried using MLDataUtils.jl package for this operation, however, the results are not up to the expectations. Bel...
Pentylenetetrazol asked 5/2, 2021 at 7:18

4

Solved

I know that train_test_split splits it randomly, but I need to know how to split it based on time. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) # th...
Ferdinana asked 15/6, 2018 at 17:0

2

Solved

I have a dataset like this my_data= [['Manchester', '23', '80', 'CM', 'Manchester', '22', '79', 'RM', 'Manchester', '19', '76', 'LB'], ['Benfica', '26', '77', 'CF', 'Benfica', '22', '74',...
Astatine asked 22/9, 2020 at 6:27

4

Solved

I am curious if there is something similar to sklearn's http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.html for apache-spark in the latest 2.0.1 rel...
Towhead asked 12/10, 2016 at 9:2

1

I have a data file with following columns 'customer', 'calibrat' - Calibration sample = 1; Validation sample = 0; 'churn', 'churndep', 'revenue', 'mou', Data file contains some 40000 rows out ...

0

I am training an NER model using the python -m spacy train command line tool. I use gold.docs_to_json to convert my annotated documents to the JSON-serializable format. The command line training t...
Vitalis asked 26/1, 2020 at 18:40

4

Solved

I am trying to create a machine learning model using DecisionTreeClassifier. To train & test my data I imported train_test_split method from scikit learn. But I can not understand one of its ar...

3

Solved

I am using this excellent article to learn Machine learning. https://stackabuse.com/python-for-nlp-multi-label-text-classification-with-keras/ The author has tokenized the X and y data after spli...
Patton asked 28/8, 2019 at 13:15

2

Actually, there is a contradiction of 2 facts that are the possible answers to the question: The conventional answer is to do it after splitting as there can be information leakage, if done befor...
Annisannissa asked 25/5, 2019 at 19:38

1

Solved

I'm working on a classification problem and I've split my data into train and test set. I have few categorical columns (around 4 -6) and I am thinking of using pd.get_dummies to convert my ...

2

There is already a description here of how to do stratified train/test split in scikit via train_test_split (Stratified Train/Test-split in scikit-learn) and a description of how to random train/va...
Enneagon asked 27/11, 2016 at 12:49

1

Solved

Before I lodge this question, I have to say I've thoroughly read more than 15 similar topics on this board, each with somehow different recommendations, but all of them just could not get me right....
Halide asked 21/8, 2017 at 19:14

3

I am at the moment trying make a setup script, capable of setting up a workspace up for me, such that I don't need to do it manually. I started doing this in bash, but quickly realized that would ...
Phalanstery asked 29/8, 2016 at 16:17
1

© 2022 - 2024 — McMap. All rights reserved.