The Wikipedia page for backpropagation has this claim:
The backpropagation algorithm for calculating a gradient has been rediscovered a number of times, and is a special case of a more general technique called automatic differentiation in the reverse accumulation mode.
Can someone expound on this, put it in layman's terms? What is the function being differentiated? What is the "special case"? Is it the adjoint values themselves that are used or the final gradient?
Update: since writing this I have found that this is covered in the Deep Learning book, section 6.5.9. See https://www.deeplearningbook.org/ . I have also found this paper to be informative on the subject: "Stable architectures for deep neural networks" by Haber and Ruthotto.