Tensorflow dataset not saved in multiple shards
Asked Answered
T

0

7

I want to use the tensorflow dataset saving and loading functions but I am not sure to understand the sharding method.

The documentation indicates :

The saved dataset is saved in multiple file "shards". By default, the dataset output is divided to shards in a round-robin fashion but custom sharding can be specified via the shard_func function.

But when I save a dataset through the save function, it seems that only one huge shard is generated.

import tempfile
import tensorflow as tf

path = os.path.join(tempfile.gettempdir(), "saved_data")
dataset = tf.data.Dataset.range(10**8)

dataset.save(path)

generated dataset screenshot

Am I missing something ?

I use Tensorflow 2.10.0 and Python 3.9.7

Tempered answered 12/9, 2022 at 15:39 Comment(2)
Did you resolve? I'm getting one big 9.5GB shard ...Satin
Unfortunately I got no answer and the saving method still creates one big file with newer TF versions.Tempered

© 2022 - 2024 — McMap. All rights reserved.