Is there any working implementation of reverse mode automatic differentiation for Haskell?
Asked Answered
G

4

12

The closest-related implementation in Haskell I have seen is the forward mode at http://hackage.haskell.org/packages/archive/fad/1.0/doc/html/Numeric-FAD.html.

The closest related related research appears to be reverse mode for another functional language related to Scheme at http://www.bcl.hamilton.ie/~qobi/stalingrad/.

I see reverse mode in Haskell as kind of a holy grail for a lot of tasks, with the hopes that it could use Haskell's nested data parallelism to gain a nice speedup in heavy numerical optimization.

Granthem answered 30/4, 2010 at 13:53 Comment(2)
A possible alternative: I've had quite a bit of success with optimising large systems (eg. 10000 dimensional) with forward AD. (The code was C++ but largely written in a pure functional style.) The trick was to exploit the sparsity my problem had so I could use a sparse type to represent the derivatives. It was faster than the reverse AD version for my problem (again written in C++, but not at all pure).Denman
Really? I'm wondering how to accomplish such a thing. Any leads?Granthem
N
59

In response to this question, I've uploaded a package named ad to Hackage for handling reverse-mode automatic differentiation in Haskell.

Internally, it leverages a trick from Andy Gill's Kansas Lava to observe sharing in the tape it records for back propagation purposes, and uses type level branding to avoid confusing sensitivities.

I've tried to keep the API relatively close to that of Barak Pearlmutter and Jeffrey Mark Siskind's fad package, but I couldn't resist making a couple of minor tweaks here and there for generality.

I still need to go through and finish up the remaining unimplemented fad combinators, figure out a nice way to build a reverse-mode AD tower, validate that I didn't screw up my recollection of basic calculus, and provide a nice API for using this approach to get local reverse mode checkpoints in an otherwise forward mode AD program, but I am quite happy with how things have progressed thus far.

Nonmaterial answered 16/5, 2010 at 4:39 Comment(4)
Implementing an entire library in order to give this question a proper answer--now that's dedication!Haynie
And I appreciate it!! Though I do hope that people other than me will find Edward's contribution useful.Granthem
I'm in the process of putting together a larger library that includes both forward and reverse mode, along with mixed modes for Hessians. I'll keep you posted.Nonmaterial
I've since added a newer package with wider scope, hackage.haskell.org/package/ad , which provides both forward and reverse mode and combinators that are intelligent about switching back and forth.Nonmaterial
N
5

We have a bunch of forward mode AD implementations (I even have one in my monoids library!), but reverse mode AD for all of Haskell seems to be intractable.

Sadly while Pearlmutter and Siskind give a translation for a lambda calculus, it doesn't map into something you can do for arbitrary Haskell lambdas, you don't get the right introspection properties and given the way the shape of the types change in the translation you don't get something that is amenable to being packed into a monad, arrow, or other control structure.

I had a go at it via a series of email exchanges with Pearlmutter, but ultimately the best I was able to obtain was a reverse mode AD solution for a small EDSL in Haskell, and not a solution for Haskell itself.

Nonmaterial answered 30/4, 2010 at 15:16 Comment(3)
What do you mean by "all of Haskell"? You can't expect to differentiate all functions. You only want to differentiate functions written to a particular interface such as Num. Pearlmutter has pointed out some issues with nesting derivatives but that's not a roadblock to implementing reverse AD that can be used to solve real-world problems. The problems I've found with reverse AD in Haskell have been different. For efficiency you want explicit sharing in trees, and to store state in the tree as you sweep through it. This can all be implemented in pure Haskell fine, but it's not efficient.Denman
I agree that I wouldn't need to differentiate any possible Haskell programs -- only more typical numerical objective functions. Are you sharing your EDSL online anywhere? What sort of subproblems can it differentiate?Granthem
@Ian: I'll see about polishing it up and posting it when I get some downtime. @user207442: You can make sharing visible by a number of means, the way I usually go is through StableNames, which lets me avoid a lot of the ugliness of using something explicitly monadic or using oleg-style explicit let_ bindings. I may have a go at it again as I've had to tackle those issues in other settings since I last looked at reverse mode AD.Nonmaterial
H
2

Not that I'm aware of. I do know that some Haskell folks are interested in automatic differentiation, but some quick digging found little more than short asides mentioning the reverse mode; I expect you already found the same material I did.

I also note that the fad package and Stalingrad project you found are in fact the work of the same two people, and that at least Prof. Pearlmutter has posted to the haskell-cafe mailing list. You may want to consider contacting him directly about his work--it's possible he has something in progress, or hit serious obstacles while attempting to implement reverse-mode AD.

Sorry I couldn't turn up anything more useful; if someone else wants to dig further, at least the links above are a starting point.

Haynie answered 30/4, 2010 at 15:16 Comment(1)
Thanks for your reply anyhow. You at least helped reassure me that I didn't miss something ;)Granthem
F
2

I think forward is the way to go in Haskell. You shouldn't be able to do reverse mode on arbitrary functions, as Edward pointed out. But you responded that you should be able to do it on certain constrained functions. And said constraints can lead readily to forward mode. Eg. if you have a function:

foo :: Num a => a -> a -> a

Then you can instantiate a with a differentiable type, and thus differentiate foo in forward mode.

See the vector-space library on Hackage for very elegant forward mode automatic differentiation. It might not be entirely clear how to use it at first. Read the paper about it, Beautiful Differentiation by Conal Elliott.

Fourpenny answered 1/5, 2010 at 0:22 Comment(3)
Thanks, but the key advantage of reverse mode is a cheap gradient. This is very important in my applications, which require gradients of functions of 1000's of variables. I'll have a look at forward mode and see if it meets my need, but I'm not optimistic.Granthem
OK, I skimmed the paper and watched the video at vimeo.com/6622658, and I think vector-space might be promising. But I still don't see how to use it to actually compute derivatives of functions. Documentations seems to be lacking, or I'm slow. Maybe I'll open another question for this.Granthem
For dense high-dimensional problems, forward mode is a non-starter. If you're optimising a fluid sim with 100,000 variables, it's simply impossible in forward mode. But it only multiplies the fluid sim complexity by a small constant factor in reverse mode. Even for 100,000 variables! (If you have enough memory to store the execution tree.)Denman

© 2022 - 2024 — McMap. All rights reserved.