Tensorflow tf.data AUTOTUNE
Asked Answered
S

2

43

I was reading the TF performance guide for Data Loading section. For prefetch it says,

The tf.data API provides a software pipelining mechanism through the tf.data.Dataset.prefetch transformation, which can be used to decouple the time when data is produced from the time when data is consumed. In particular, the transformation uses a background thread and an internal buffer to prefetch elements from the input dataset ahead of the time they are requested. The number of elements to prefetch should be equal to (or possibly greater than) the number of batches consumed by a single training step. You could either manually tune this value, or set it to tf.data.experimental.AUTOTUNE which will prompt the tf.data runtime to tune the value dynamically at runtime.

What is AUTOTUNE doing internally? Which algorithm, heuristics are being applied?

Additionally, in practice, what kind of manual tuning is done?

Sailer answered 15/6, 2019 at 18:35 Comment(0)
E
15

tf.data builds a performance model of the input pipeline and runs an optimization algorithm to find a good allocation of its CPU budget across all parameters specified as AUTOTUNE. While the input pipeline is running, tf.data tracks the time spent in each operation, so that these times can be fed into the optimization algorithm.

The OptimizationOptions object gives some control over how autotune will behave.

Eddy answered 26/12, 2019 at 20:27 Comment(2)
this doesn't say anything about AUTOTUNE :/Mathieu
It does mention AUTOTUNE. Whichever dataset pre-fetches are set to AUTOTUNE are monitored and optimized. However I agree it does not fully answer the OPs question as to what heuristics it uses internally. The associated TF documentation is weakCatalepsy
S
4

The authors provide details about the AUTOTUNE in their vldb paper https://vldb.org/pvldb/vol14/p2945-klimovic.pdf. Refer section 3.3.2.

enter image description here

Son answered 22/1, 2023 at 9:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.