Optimization issue, Nonlinear: automatic analytical Jacobian/Hessian from objective and constraints in R?

Asked 21/2, 2014 at 5:30 Answered 7/1, 2016 at 5:0

In R, is it possible to find the Jacobian/Hessian/sparsity pattern analytically when you provide just the objective function and constraints for an optimization problem?

AMPL does this, and from what I hear even MATLAB can do this, but I don't know if you need Knitro for this.

However, all the optimization tools for R (such as nloptr) seem to require me to enter the gradient and Hessian myself, which is very difficult since I am working with a complex model.

Cassareep answered 21/2, 2014 at 5:30 Comment(2)

There is a hessian parameter in optim. Is this not what you want? – Blinding 21/2, 2014 at 6:1

The hessian is computed numerically which is a problem for the type of work I am doing. – Cassareep 21/2, 2014 at 6:39

What you are looking for is called automatic differentiation. Sadly, it looks like to me it is not available in R.

There were attempts 5 years ago to implement it but my short investigation indicates that these attempts died out.

There is a fairly recent R package (Automatic Differentiation Model Builder) but it is unclear to me how to use it, or how to apply this to your situation. (I don't use R myself, that's why I don't know.)

Aurar answered 21/2, 2014 at 11:53 Comment(3)

I just wrote an AD package in R: github.com/shabbychef/madness . Submit an issue if you have any. – Noriega 7/1, 2016 at 4:40

Note for people reading this, apparently Julia (the language) can do this. I will begin the switch partly for this – Cassareep 24/12, 2017 at 10:57

@robertevansanders Several automatic differentiation libraries exist in other languages, but the question specifically asked for a solution in R. Apparently the above linked shabbychef / madness R package can do this too but it did not exist in 2014 when I wrote my answer. – Aurar 24/12, 2017 at 14:3

I wrote a basic package for Automatic Differentation in R, called madness. While it is primarily designed for the multivariate delta method, it should also be usable for computing gradients automatically. An example usage:

require(madness)

# the 'fit' is the Frobenius norm of Y - L*R
# with a penalty for strongly negative R.
compute_fit <- function(R,L,Y) {
    Rmad <- madness(R)
    Err <- Y - L %*% Rmad
    penalty <- sum(exp(-0.1 * Rmad))
    fit <- norm(Err,'f') + penalty
}

set.seed(1234)
R <- array(runif(5*20),dim=c(5,20))
L <- array(runif(1000*5),dim=c(1000,5))
Y <- array(runif(1000*20),dim=c(1000,20))
ftv <- compute_fit(R,L,Y)
show(ftv)
show(val(ftv))
show(dvdx(ftv))

I get back the following:

class: madness 
        d (norm((numeric - (numeric  %*% R)), 'f') + (sum(exp((numeric * R)), na.rm=FALSE) + numeric))
 calc: ------------------------------------------------------------------------------------------------ 
                                                      d R
  val: 207.6 ...
 dvdx: 4.214 4.293 4.493 4.422 4.672 2.899 2.222 2.565 2.854 2.718 4.499 4.055 4.161 4.473 4.069 2.467 1.918 2.008 1.874 1.942 0.2713 0.2199 0.135 0.03017 0.1877 5.442 4.81
1 5.472 5.251 4.674 1.933 1.62 1.79 1.902 1.665 5.232 4.435 4.789 5.183 5.084 3.602 3.477 3.419 3.592 3.376 4.109 3.937 4.017 3.816 4.2 1.776 1.784 2.17 1.975 1.699 4.365 4
.09 4.475 3.964 4.506 1.745 1.042 1.349 1.084 1.237 3.1 2.575 2.887 2.524 2.902 2.055 2.441 1.959 2.467 1.985 2.494 2.223 2.124 2.275 2.546 3.497 2.961 2.897 3.111 2.9 4.44
2 3.752 3.939 3.694 4.326 0.9582 1.4 0.8971 0.8756 0.9019 2.399 2.084 2.005 2.154 2.491 ...
 varx:  ...

      [,1]
[1,] 207.6

      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]  [,21]  [,22] [,23]   [,24]  [,25] [,26] [,27]
[1,] 4.214 4.293 4.493 4.422 4.672 2.899 2.222 2.565 2.854 2.718 4.499 4.055 4.161 4.473 4.069 2.467 1.918 2.008 1.874 1.942 0.2713 0.2199 0.135 0.03017 0.1877 5.442 4.811
     [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50] [,51] [,52] [,53] [,54]
[1,] 5.472 5.251 4.674 1.933  1.62  1.79 1.902 1.665 5.232 4.435 4.789 5.183 5.084 3.602 3.477 3.419 3.592 3.376 4.109 3.937 4.017 3.816   4.2 1.776 1.784  2.17 1.975
     [,55] [,56] [,57] [,58] [,59] [,60] [,61] [,62] [,63] [,64] [,65] [,66] [,67] [,68] [,69] [,70] [,71] [,72] [,73] [,74] [,75] [,76] [,77] [,78] [,79] [,80] [,81]
[1,] 1.699 4.365  4.09 4.475 3.964 4.506 1.745 1.042 1.349 1.084 1.237   3.1 2.575 2.887 2.524 2.902 2.055 2.441 1.959 2.467 1.985 2.494 2.223 2.124 2.275 2.546 3.497
     [,82] [,83] [,84] [,85] [,86] [,87] [,88] [,89] [,90]  [,91] [,92]  [,93]  [,94]  [,95] [,96] [,97] [,98] [,99] [,100]
[1,] 2.961 2.897 3.111   2.9 4.442 3.752 3.939 3.694 4.326 0.9582   1.4 0.8971 0.8756 0.9019 2.399 2.084 2.005 2.154  2.491

Note that the derivative is the derivative of the scalar with respect to a 5 x 20 matrix, but is flattened here to a vector. (Unfortunately a row vector.)

Noriega answered 7/1, 2016 at 5:0 Comment(0)

1) The default Nelder Mead method in optim does not need derivatives and does not compute them internally either.

2) D, deriv and related R functions (see ?deriv) can compute simple symbolic derivatives.

3) The Ryacas package can compute symbolic derivatives.

Bramblett answered 21/2, 2014 at 9:28 Comment(0)

Take a look at solnp, package Rsolnp. It is a nonlinear programming solver which does not require analytical Jacobian or Hessian:

min f(x)
s.t. g(x) = 0
l[h] <= h(x) <= u[h]
l[x] <= x <= u[x]

Rapp answered 21/2, 2014 at 6:17 Comment(4)

Do you know offhand if it computes the Jacobian and Hessian numerically? Or is it able to understand the optimization problem the way a mathematical programming language would and find analytical derivatives it can use? – Cassareep 21/2, 2014 at 6:40

It uses finite differences type of numerical approximation, of course. The latter (symbolic differentiation) is also available in R, by the way. – Rapp 21/2, 2014 at 6:46

Finite differences is a big problem in the kind of work I am doing. The computational burden is quite high and I need to provide analytical derivatives to the solver if it is to finish in a reasonable amount of time. Are you saying I have to always program the derivatves myself in R? (Unlike MATLAB and AMPL?) – Cassareep 21/2, 2014 at 8:26

I can't say, always; in most cases, yes. Here's a comprehensive list, you may find something useful there. Please let me know if your search is successful. – Rapp 21/2, 2014 at 8:33

Recommended topics

Hot tags