How to load MNIST via TensorFlow (including download)?
Asked Answered
J

1

1

The TensorFlow documentation for MNIST recommends multiple different ways to load the MNIST dataset:

All ways described in the documentation throw many deprecated warnings with TensorFlow 1.8.

The way I'm currently loading MNIST and creating batches for training:

class MNIST:
    def __init__(self, optimizer):
        ...
        self.mnist_dataset = input_data.read_data_sets("/tmp/data/", one_hot=True)
        self.test_data = self.mnist_dataset.test.images.reshape((-1, self.timesteps, self.num_input))
        self.test_label = self.mnist_dataset.test.labels
        ...

    def train_run(self, sess):
        batch_input, batch_output = self.mnist_dataset.train.next_batch(self.batch_size, shuffle=True)
        batch_input = batch_input.reshape((self.batch_size, self.timesteps, self.num_input))
        _, loss = sess.run(fetches=[self.train_step, self.loss], feed_dict={self.input_placeholder: batch_input, self.output_placeholder: batch_output})
        ...

    def test_run(self, sess):
        loss = sess.run(fetches=[self.loss], feed_dict={self.input_placeholder: self.test_data, self.output_placeholder: self.test_label})
        ...

How could I do exactly the same thing, just with the current method of doing this?

I couldn't find any documentation on this.

It seems to me that the new way is something in the lines of:

train, test = tf.keras.datasets.mnist.load_data()
self.mnist_train_ds = tf.data.Dataset.from_tensor_slices(train)
self.mnist_test_ds = tf.data.Dataset.from_tensor_slices(test)

But how can I use these datasets in my train_run and test_run method?

Joshi answered 3/6, 2018 at 12:58 Comment(0)
B
3

An example of loading the MNIST dataset using TF dataset API:


Create a mnist dataset to load train, valid and test images:

You can create a dataset for numpy inputs, either using Dataset.from_tensor_slices or Dataset.from_generator. Dataset.from_tensor_slices adds the whole dataset to the computational graph, so we will use Dataset.from_generator instead.

#load mnist data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

def create_mnist_dataset(data, labels, batch_size):
  def gen():
    for image, label in zip(data, labels):
        yield image, label
  ds = tf.data.Dataset.from_generator(gen, (tf.float32, tf.int32), ((28,28 ), ()))

  return ds.repeat().batch(batch_size)

#train and validation dataset with different batch size
train_dataset = create_mnist_dataset(x_train, y_train, 10)
valid_dataset = create_mnist_dataset(x_test, y_test, 20)

A feedable iterator that can toggle between training and validation

handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(
handle, train_dataset.output_types, train_dataset.output_shapes)
image, label = iterator.get_next()

train_iterator = train_dataset.make_one_shot_iterator()
valid_iterator = valid_dataset.make_one_shot_iterator()

A sample run:

#A toy network
y = tf.layers.dense(tf.layers.flatten(image),1,activation=tf.nn.relu)
loss = tf.losses.mean_squared_error(tf.squeeze(y), label)

with tf.Session() as sess:
   sess.run(tf.global_variables_initializer())

   # The `Iterator.string_handle()` method returns a tensor that can be evaluated
   # and used to feed the `handle` placeholder.
   train_handle = sess.run(train_iterator.string_handle())
   valid_handle = sess.run(valid_iterator.string_handle())

   # Run training
   train_loss, train_img, train_label = sess.run([loss, image, label],
                                                 feed_dict={handle: train_handle})
   # train_image.shape = (10, 784) 

   # Run validation
   valid_pred, valid_img = sess.run([y, image], 
                                    feed_dict={handle: valid_handle})
   #test_image.shape = (20, 784)
Bolen answered 3/6, 2018 at 17:31 Comment(3)
You didn't show where your "mnist" is coming from. I assumed mnist = tf.contrib.learn.datasets.load_dataset("mnist") but testing this with tf 1.8 throws a ton of deprecated warnings.Joshi
You can either use ` tf.logging` to set of those warnings, or you can download the mint files from yann.lecun.com/exdb/mnist and convert it to numpy array.Bolen
Sure I could do this, but that was not the question. If something gets deprecated I normally assume there is a replacement, a new way to do the deprecated method. So you are saying they just deprecated the MNIST loading without a replacement? If that is the case, than that would be the answer to my original question.Joshi

© 2022 - 2024 — McMap. All rights reserved.