Implementing automatic differentiation for 2nd derivative: algorithm for traversing the computational graph?
Asked Answered
M

2

9

I am attempting to implement automatic differentiation for a Python statistics package (the problem formulation is similar to optimization problem formulations).

The computational graph is generated using operator overloading and factory functions for operations like sum(), exp(), etc. I have implemented automatic differentiation for the gradient using reverse accumulation. However, I have found implementing automatic differentiation for the second derivative (the Hessian) much more difficult. I know how to do the individual 2nd partial gradient calculations, but I have had trouble coming up with an intelligent way to traverse the graph and do the accumulations. Does anyone know of good articles that give algorithms for automatic differentiation for the second derivative or open source libraries that implement the same that I may try to learn from?

Mammalogy answered 4/7, 2010 at 0:54 Comment(4)
"Off-topic" my foot (commenting to the lone SOer who voted that way) -- this is all about programming, what else could "traversing the computational graph" be about?! (Though I don't understand why @John can't do the 2nd derivative by just applying his first-derivative functionality twice, that may be because I don't know what a "Hessian" is [[except a German-born soldier fighting for the Brits in 1776!-)]]).Delrosario
To answer your question, differentiating twice is non-trivial because of interactions between variables. If your function is a scalar (with n inputs), the 1st derivative is a vector length n, the 2nd derivative is an n^2 matrix the 3rd derivative is n^3 etc. For the first derivative, you have to travel up 1 path from independent dependent variable per term, for the second derivative you have to travel up two different paths. I /was/ a little worried this was off topic, but I don't know what a better forum for this question is; it's definitely not a math overflow thing.Mammalogy
Is automatic differentiation absolutely necessary? Every time I have considered it, I found that manually differentiating the algorithm by hand to be more straightforward, but then again, my Hessians have usually been quite simple (like diagonal, or computable by an analytic formula).Lavernelaverock
It's not absolutely necessary, but it is a very desirable feature. I am adding this feature to a library, so it would be very nice for users to be able to use algorithms which require the second derivative without supplying manual derivative information.Mammalogy
S
1

First you must decide if you want o calculate a sparse Hessian or something closer to a fully dense Hessian.

If sparse is what you want, there are currently two competitive ways of doing this. Only using the computational graph in a clever way, one reverse sweep of the computational graph you can calculate the Hessian matrix using the edge_pushing algorithm:

http://www.tandfonline.com/doi/full/10.1080/10556788.2011.580098

Or you can try graph colouring techniques to compact your Hessian matrix into a matrix of fewer columns, then use reverse accumulation to calculate each column

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.66.2603

If what you want is a dense Hessian (unusual in practice) then your probably better of calculating one column of the Hessian at a time using reverse accumulation (search for BRUCE CHRISTIANSON and reverse accumulation)

Shut answered 24/11, 2012 at 12:9 Comment(1)
That's pretty interesting. Do you have a pdf version of the first paper?Mammalogy
C
-1

The usual method for approximating the Hessian in 3 dimensions is the BFGS

The L-BFGS method is similar.

Here you can find the source code for the L-BFGS (which calculates the Hessian as an intermediate result for solving ODEs) in several languages (C#, C++, VBA, etc.) although not in python. I think it is not easy to translate.

If you are going to translate the alg from another language, pay particular attention to numerical errors and do sensitivity analysis (you'll need to calculate the inverse of the Hessian matrix)

Centrosome answered 6/7, 2010 at 0:31 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.