How to do automatic differentiation on hmatrix?
Asked Answered
S

1

5

Sooooo ... as it turns out going from fake matrices to hmatrix datatypes turns out to be nontrivial :)

Preamble for reference:

{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ParallelListComp #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE FlexibleContexts #-}

import           Numeric.LinearAlgebra.HMatrix
import           Numeric.AD

reconstruct :: (Container Vector a, Num (Vector a)) 
            => [a] -> [Matrix a] -> Matrix a
reconstruct as φs = sum [ a `scale` φ | a <- as | φ <- φs ]

preserveInfo :: (Container Vector a, Num (Vector a))
     => Matrix a -> [a] -> [Matrix a] -> a
preserveInfo img as φs = sumElements (errImg * errImg)
    where errImg = img - (reconstruct as φs)

And the call to the gradientDescent function:

gradientDescentOverAs :: forall m a. (Floating a, Ord a, Num (Vector a))
                      => Matrix a -> [Matrix a] -> [a] -> [[a]]
gradientDescentOverAs img φs as0 = gradientDescent go as0
  where go as = preserveInfo img as φs

edit: this is not the code in the original question but boiled down as much as possible. GHC requires some constraints on the go sub-function, but the answer proposed in the linked question doesn't apply here.

edit2, quoting myself from below:

I come to believe it can't be done. Matrix requires it's elements to be in the Element class. The only elements there are Double, Float and their Complex forms. All of these are not accepted by gradientDescent.

So basically this is the same question as the one linked above, but for the hmatrix datatypes instead of my handrolled ones.

edit3

Relevant, email conversation between Edward Kmett and Dominic Steinitz on the topic: https://mail.haskell.org/pipermail/haskell-cafe/2013-April/107561.html

Supervisor answered 6/5, 2015 at 9:5 Comment(0)
B
1

I found this series of blog posts to be very helpful: https://idontgetoutmuch.wordpress.com/2014/09/09/fun-with-extended-kalman-filters-4/ (both HMatrix with static size guarantees and the jacobian function from AD are demonstrated).

HTH

Bedding answered 6/5, 2015 at 15:13 Comment(9)
Oooh ... yep ... that one crept up several times in my reading queue ... never had the persistence to work though it though. Guess now is the time.Supervisor
Tried to work through it yesterday evening ... now I remember why I never got through that ... that's some dense stuff for someone without a background in machine learning (which I should have ... but still).Supervisor
perhaps you could boil down the smallest (non-)working example from the above, get rid of the greek letters and give the functions some more expressive names? Also, what about getting rid of the signature for go in the definition of adEq5H ? and, why do you have two definitions for eqsH ?Bedding
I fear I can't accept this answer. Dominic is using the (experimental) static interface to hmatrix. I'd like to do that too, but it doesn't even provide a scale function. --- Regarding your questions: I can boil it down some more. I like the greek letters. The function names are crap, but both (the greek letters and the function names) refer to a paper I am trying to implement. I can't get rid of the signature for go ... in the linked question it is strictly necessary. And eqsH ... oops ... one of them is the derivative.Supervisor
I come to believe it can't be done. Matrix requires it's elements to be in the Element class. The only elements there are Double, Float and their Complex forms. All of these are not accepted by gradientDescent.Supervisor
@Supervisor you are correct. ad needs to be able to store its notes for either forward or reverse mode somewhere and hmatrix is basically just using a nice fast dumb flat BLAS-like layout with no room to store anything like that. Their approach favors the ability to talk to external tools over flexibility.Roadability
@EdwardKMETT what are the notes you refer to? The factors of the partial derivatives? What format are they in?Bedding
@ocramz: In forward mode they are just the partials themselves. For a fixed number of derivatives you can unpack that. We even have an Unbox instance for Vector for it. We have two reverse modes. One uses a tape, so you need to store the entries in the tape you reference for earlier derivatives. Given that we use tree with nodes that have degree at most 2, that can be unpacked in theory. The tape is outside your structure. The third mode is the "Kahn" mode where i record the full graph as a tree, and then use observable sharing to topologically sort and recover sharing when I'm done.Roadability
However, none of those forms is suitable for handing off to a normal BLAS.Roadability

© 2022 - 2024 — McMap. All rights reserved.