Where is Wengert List in TensorFlow?

Asked 9/5, 2017 at 6:56 Answered 15/7, 2020 at 9:7

Solved tensorflow automatic-differentiation

TensorFlow use reverse-mode automatic differentiation(reverse mode AD), as shown in https://github.com/tensorflow/tensorflow/issues/675.

Reverse mode AD need a data structure called a Wengert List - see https://en.wikipedia.org/wiki/Automatic_differentiation#Reverse_accumulation.

However, searching through the TensorFlow repository with the keyword "Wengert List", I get nothing.

Do they use a different name, or do they get rid of Wengert List? If so, how?

Impresario answered 9/5, 2017 at 6:56 Comment(0)

AD terminology is very old. It was invented when there was no Python and things were complicated. Nowadays you could just use a regular Python list for that purpose.

Implementation of reverse AD is in gradients function of gradients_impl.py here

The data-structure used to store the tape is initialized on line 532 and it's a Python Queue

# Initialize queue with to_ops.
queue = collections.deque()

Tropic answered 9/5, 2017 at 14:23 Comment(1)

Actually, machines were simpler back then but harder to use, more "austere". Now syntactic sugar is everywhere, but when it breaks .. well you create an issue. – Koodoo 26/10, 2023 at 19:27

However, searching through the TensorFlow repository with the keyword "Wengert List", but I get nothing.

This is because TensorFlow is not tape based AD, it is graph based AD system.

Wengert list would be the tape describing the order in which operations were originally executed.

There is also source code transformation based AD and a nice example of that system is Tangent.

Nowadays almost no one uses tape (Wengert list) any more. Check for instance what PyTorch does (Page 2).

Lariat answered 15/7, 2020 at 9:7 Comment(0)

TensorFlow 2 uses a Wengert List (a tape) as does JAX and Autograd. This is because these tools keep track of the operations on variables with some sort of gradient_tape.

Tensorflow 1 did not use a Wengert list to keep track of what computation was being performed, instead it used a static graph to keep track of what computation was being performed. This had certain performance benefits, but limited what TensorFlow 1 was capable of doing.

Incomprehension answered 28/1, 2020 at 17:39 Comment(0)

Recommended topics

Hot tags