TF.data.dataset.map(map_func) with Eager Mode
Asked Answered
V

2

8

I am using TF 1.8 with eager mode enabled.

I cannot print the example inside the mapfunc. It when I run tf.executing_eagerly() from within the mapfunc I get "False"

import os
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)

tfe = tf.contrib.eager
tf.enable_eager_execution()
x = tf.random_uniform([16,10], -10, 0, tf.int64)
print(x)
DS = tf.data.Dataset.from_tensor_slices((x))


def mapfunc(ex, con):
    import pdb; pdb.set_trace()
    new_ex = ex + con
    print(new_ex) 
    return new_ex

DS = DS.map(lambda x: mapfunc(x, [7]))
DS = DS.make_one_shot_iterator()

print(DS.next())

print(new_ex) outputs:

Tensor("add:0", shape=(10,), dtype=int64)

Outside mapfunc, it works fine. But inside it, the passed example does not have a value, nor .numpy() attribute.

Veratrine answered 25/5, 2018 at 23:38 Comment(0)
M
11

The tf.data transformations actually execute as a graph, so the body of the map function itself isn't executed eagerly. See #14732 for some more discussion on this.

If you really need eager execution for the map function, you could use tf.contrib.eager.py_func, so something like:

DS = DS.map(lambda x: tf.contrib.eager.py_func(
  mapfunc,
  [x, tf.constant(7, dtype=tf.int64)], tf.int64)
# In TF 1.9+, the next line can be print(next(DS))
print(DS.make_one_shot_iterator().next())

Hope that helps.

Note that by adding a py_func to the dataset, the single-threaded Python interpreter will be in the loop for every element produced.

Mccubbin answered 26/5, 2018 at 0:14 Comment(2)
Thank you for the help. I am going for the performance optimization techniques that were advertised during the Dev Summit '18. I am using this for text, and also using it to pass different preprocessing functions in a more elegant way.Veratrine
Another thing that I just noticed: in the arguments' list, you can only pass tensors, but not general objects, such as another function. In my case, I would want to pass a tokenizer since I am working with text, but anyway this is still doable in the mapfunc function's body.Veratrine
V
3

Anything within map is run as graph no matter what mode is used outside. See https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map

As in the page, there are 3 options:

  1. Rely on AutoGraph to convert Python code into an equivalent graph computation. The downside of this approach is that AutoGraph can convert some but not all Python code.
  2. Use tf.py_function, which allows you to write arbitrary Python code but will generally result in worse performance than 1)
  3. Use tf.numpy_function, which also allows you to write arbitrary Python code. Note that tf.py_function accepts tf.Tensor whereas tf.numpy_function accepts numpy arrays and returns only numpy arrays.

With tf.py_function() your line will become:

DS = DS.map(lambda y: tf.py_function(
                          (lambda x: mapfunc(x, [7])),
                          inp=[y], Tout=tf.int64
                      ))

The same applies to tf.map_fn() and tf.vectorized_map().

Vogeley answered 19/4, 2022 at 15:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.