Tensorflow dataset not saved in multiple shards

I want to use the tensorflow dataset saving and loading functions but I am not sure to understand the sharding method.

The saved dataset is saved in multiple file "shards". By default, the dataset output is divided to shards in a round-robin fashion but custom sharding can be specified via the shard_func function.

But when I save a dataset through the save function, it seems that only one huge shard is generated.

import tempfile
import tensorflow as tf

path = os.path.join(tempfile.gettempdir(), "saved_data")
dataset = tf.data.Dataset.range(10**8)

dataset.save(path)

generated dataset screenshot

Am I missing something ?

I use Tensorflow 2.10.0 and Python 3.9.7

Recommended topics

Hot tags