Is there a way to partition a tf.Dataset with TensorFlow’s Dataset API?
Asked Answered
P

2

0

I checked the doc but I could not find a method for it. I want to de cross validation, so I kind of need it.

Note that I'm not asking how to split a tensor, as I know that TensorFlow provides an API for that an has been answered in another question. I'm asking on how to partition a tf.Dataset (which is an abstraction).

Puiia answered 6/5, 2018 at 21:18 Comment(2)
I posted my answer over here. I think it better answers your question.Oscillograph
Does this answer your question? Split a dataset created by Tensorflow dataset API in to Train and Test?Franke
S
1

You could either:

1) Use the shard transformation partition the dataset into multiple "shards". Note that for best performance, sharding should be to data sources (e.g. filenames).

2) As of TensorFlow 1.12, you can also use the window transformation to build a dataset of datasets.

Spleenwort answered 30/10, 2018 at 17:56 Comment(0)
D
1

I am afraid you cannot. The dataset API is a way to efficiently stream inputs to your net at run time. It is not a set of tools to manipulate datasets as a whole -- in that regards it might be a bit of a misnomer.

Also, if you could, this would probably be a bad idea. You would rather have this train/test split done once and for all.

  • it let you review those sets offline
  • if the split is done each time you run an experiment there is a risk that samples start swapping sets if you are not extremely careful (e.g. when you add more data to your existing dataset)

See also a related question about how to split a set into training & testing in tensorflow.

Detrude answered 7/5, 2018 at 8:6 Comment(1)
I believe that splitting the data is necessary for cross validation. I suppose one could partition it only once in many chunks...Omega
S
1

You could either:

1) Use the shard transformation partition the dataset into multiple "shards". Note that for best performance, sharding should be to data sources (e.g. filenames).

2) As of TensorFlow 1.12, you can also use the window transformation to build a dataset of datasets.

Spleenwort answered 30/10, 2018 at 17:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.