How to handle boundary constraints when using `nls.lm` in R

Asked 11/6, 2013 at 9:47 Answered 19/10, 2016 at 15:10

I asked this question a while ago. I am not sure whether I should post this as an answer or a new question. I do not have an answer but I "solved" the problem by applying the Levenberg-Marquardt algorithm using nls.lm in R and when the solution is at the boundary, I run the trust-region-reflective algorithm (TRR, implemented in R) to step away from it. Now I have new questions.

From my experience, doing this way the program reaches the optimal and is not so sensitive to the starting values. But this is only a practical method to step aside from the issues I encounterd using nls.lm and also other optimization functions in R. I would like to know why nls.lm behaves this way for optimization problems with boundary constraints and how to handle the boundary constraints when using nls.lm in practice.

Following I gave an example illustrating the two issues using nls.lm.

It is sensitive to starting values.
It stops when some parameter reaches the boundary.

A Reproducible Example: Focus Dataset D

library(devtools)
install_github("KineticEval","zhenglei-gao")
library(KineticEval)
data(FOCUS2006D)
km <- mkinmod.full(parent=list(type="SFO",M0 = list(ini   = 0.1,fixed = 0,lower = 0.0,upper =Inf),to="m1"),m1=list(type="SFO"),data=FOCUS2006D)
system.time(Fit.TRR <- KinEval(km,evalMethod = 'NLLS',optimMethod = 'TRR'))
system.time(Fit.LM <- KinEval(km,evalMethod = 'NLLS',optimMethod = 'LM',ctr=kingui.control(runTRR=FALSE)))
compare_multi_kinmod(km,rbind(Fit.TRR$par,Fit.LM$par))
dev.print(jpeg,"LMvsTRR.jpeg",width=480)

LM fit vs TRR fit

The differential equations that describes the model/system is:

"d_parent = - k_parent * parent"                                                 
"d_m1 = - k_m1 * m1 + k_parent * f_parent_to_m1 * parent"

In the graph on the left is the model with initial values, and in the middle is the fitted model using "TRR"(similar to the algorithm in Matlab lsqnonlin function ), on the right is the fitted model using "LM" with nls.lm. Looking at the fitted parameters(Fit.LM$par) you will find that one fitted parameter(f_parent_to_m1) is at the boundary 1. If I change the starting value for one parameter M0_parent from 0.1 to 100, then I got the same results using nls.lm and lsqnonlin.I have many cases like this one.

newpars <- rbind(Fit.TRR$par,Fit.LM$par)
rownames(newpars)<- c("TRR(lsqnonlin)","LM(nls.lm)")
newpars

                   M0_parent   k_parent        k_m1 f_parent_to_m1
TRR(lsqnonlin)  99.59848 0.09869773 0.005260654       0.514476
LM(nls.lm)      84.79150 0.06352110 0.014783294       1.000000

Except for the above problems, it often happens that the Hessian returned by nls.lm is not invertable(especially when some parameters are on the boundary) so I cannot get an estimation of the covariance matrix. On the other hand, the "TRR" algorithm(in Matlab) almost always give an estimation by calculating the Jacobian at the solution point. I think this is useful but I am also sure that R optimization algorithms(the ones I have tried) did not do this for a reason. I would like to know whether I am wrong by using the Matlab way of calculating the covariance matrix to get standard error for the parameter estimates.

One last note, I claimed in my previous post that the Matlab lsqnonlin outperforms R's optimization functions in almost all cases. I was wrong. The "Trust-Region-Reflective" algorithm used in Matlab is in fact slower(sometimes much slower) if also implemented in R as you can see from the above example. However, it is still more stable and reaches a better solution than the R's basic optimization algorithms.

Tachycardia answered 11/6, 2013 at 9:47 Comment(0)

First off, I am not an expert on Matlab and Optimisation and have never used R.

I am not sure I see what your actual question is, but maybe I can shed some light into your puzzlement:

LM is slightly enhanced Gauß-Newton approach - for problems with several local minima it is very sensitive to initial states. Including boundaries typically generates more of those minima.

TRR is akin to LM, but more robust. It has better capabilities for "jumping out of" bad local minima. It is quite feasible that it will behave better, but perform worse, than an LM. Actually explaining why is very hard. You would need to study the algorithms in detail and look at how they behave in this situation.

I cannot explain the difference between Matlab's and R's implementation, but there are several extensions to TRR that maybe Matlab uses and R does not. Does your approach of using LM and TRR alternatingly converge better than TRR alone?

Ploughman answered 11/11, 2013 at 10:26 Comment(1)

I am not sure how to define better. From the cases I have seen in practice, yes, because LM itself sometimes stops at better solutions compared to TRR (both using default control parameters/stopping rules) when the problem behaves. But LM often stops abnormally when hitting the boundary. I did not have time recently but I may still invest time on this issue and report back here. – Tachycardia 12/11, 2013 at 12:5

Using the mkin package, you can find the parameters using the "Port" algorithm (which is also a kind of a TRR algorithm as far as I could tell from its documentation), or the "Marq" algorithm, which uses nls.lm in the background. Then you can use "normal" starting values or "bad" starting values.

library(mkin)
packageVersion("mkin")

Recent mkin version can speed up the process considerably as they compile the models from automatically generated C code if a compiler is available on your system (e.g. you have r-base-dev installed on Debian/Ubuntu, or Rtools on Windows).

This defines the model:

m <- mkinmod(parent = mkinsub("SFO", "m1"),
             m1 = mkinsub("SFO"),
             use_of_ff = "max")

You can check that the differential equations are correct:

cat(m$diffs, sep = "\n")

Then we fit in four variants, Port and LM, with or without M0 fixed to 0.1:

f.Port = mkinfit(m, FOCUS_2006_D)
f.Port.M0 = mkinfit(m, FOCUS_2006_D, state.ini = c(parent = 0.1, m1 = 0))
f.LM = mkinfit(m, FOCUS_2006_D, method.modFit = "Marq")
f.LM.M0 = mkinfit(m, FOCUS_2006_D, state.ini = c(parent = 0.1, m1 = 0),
               method.modFit = "Marq")

Then we look at the results:

results <- sapply(list(Port = f.Port, Port.M0 = f.Port.M0, LM = f.LM, LM.M0 = f.LM.M0),
  function(x) round(summary(x)$bpar[, "Estimate"], 5))

which are

                   Port  Port.M0       LM    LM.M0
parent_0       99.59848 99.59848 99.59848 39.52278
k_parent        0.09870  0.09870  0.09870  0.00000
k_m1            0.00526  0.00526  0.00526  0.00000
f_parent_to_m1  0.51448  0.51448  0.51448  1.00000

So we can see that the Port algorithm finds the best solution (to the best of my knowledge) even with bad starting values. The speed issue that one may have with more complicated models is alleviated using the automatic generation of C code.

Contagion answered 19/10, 2016 at 15:10 Comment(0)

A Reproducible Example: Focus Dataset D

Recommended topics

Hot tags