How to use tf.data's initializable iterators within a tf.estimator's input_fn?
Asked Answered
G

1

10

I would like to manage my training with a tf.estimator.Estimator but have some trouble to use it alongside the tf.data API.

I have something like this:

def model_fn(features, labels, params, mode):
  # Defines model's ops.
  # Initializes with tf.train.Scaffold.
  # Returns an tf.estimator.EstimatorSpec.

def input_fn():
  dataset = tf.data.TextLineDataset("test.txt")
  # map, shuffle, padded_batch, etc.

  iterator = dataset.make_initializable_iterator()

  return iterator.get_next()

estimator = tf.estimator.Estimator(model_fn)
estimator.train(input_fn)

As I can't use a make_one_shot_iterator for my use case, my issue is that input_fn contains an iterator that should be initialized within model_fn (here, I use tf.train.Scaffold to initialize local ops).

Also, I understood that we can't only use input_fn = iterator.get_next otherwise the other ops will not be added to the same graph.

What is the recommended way to initialize the iterator?

Gendron answered 10/7, 2017 at 12:12 Comment(2)
@guillaumeklin -- did you add tf.add_to_collection(tf.GraphKeys.TABLE_INITIALIZERS, iterator.initializer) within the input_fn()?Gotcher
Yes, you can add this line in input_fn() just before return iterator.get_next().Gendron
G
13

As of TensorFlow 1.5, it is possible to make input_fn return a tf.data.Dataset, e.g.:

def input_fn():
  dataset = tf.data.TextLineDataset("test.txt")
  # map, shuffle, padded_batch, etc.
  return dataset

See c294fcfd.


For previous versions, you can add the iterator's initializer in the tf.GraphKeys.TABLE_INITIALIZERS collections and rely on the default initializer.

tf.add_to_collection(tf.GraphKeys.TABLE_INITIALIZERS, iterator.initializer)
Gendron answered 10/7, 2017 at 15:51 Comment(4)
Thanks! +1. Just to clarify the answer: need to add the tf.add_to_collection... line before returning input_fn() and then it works fine and don't need to do anything with Scaffold and local_init_ops.Hypocotyl
Excuse me, is it possible to specify names for each field of the dataset using the first method? For example, my dataset has 2 fields: "age" and "sex", and I want to return a dictionary looks like: {"age": tensor1, "sex": tensor2}.Karr
@Hypocotyl @Gendron did you add the tf.add_to_collection(...) line within the def input_fn() or elsewhere within the model_fn()? If this was added in the model_fn() then would the line still be tf.add_to_collection(tf.GraphKeys.TABLE_INITIALIZERS, iterator.initializer) or would iterator.initializer need to be changed to something else?Gotcher
You should add it in input_fn(), just after the creation of the iterator.Gendron

© 2022 - 2024 — McMap. All rights reserved.