R: minpack.lm::nls.lm failed with good results
Asked Answered
L

2

8

I use nls.lm from the minpack.lm package to fit a lot of non linear models.

It often fails after 20 iterations because of a singular gradient matrix at initial parameter estimates.

The problem is when I have a look at the iterations before failling (trace = T) I can see the results was ok.

Reproductible example:

Data:

df <- structure(list(x1 = c(7L, 5L, 10L, 6L, 9L, 10L, 2L, 4L, 9L, 3L, 
11L, 6L, 4L, 0L, 7L, 12L, 9L, 11L, 11L, 0L, 2L, 3L, 5L, 6L, 6L, 
9L, 1L, 7L, 7L, 4L, 3L, 13L, 12L, 13L, 5L, 0L, 5L, 6L, 6L, 7L, 
5L, 10L, 6L, 10L, 0L, 7L, 9L, 12L, 4L, 5L, 6L, 3L, 4L, 5L, 5L, 
0L, 9L, 9L, 1L, 2L, 2L, 13L, 8L, 2L, 5L, 10L, 6L, 11L, 5L, 0L, 
4L, 4L, 8L, 9L, 4L, 2L, 12L, 4L, 10L, 7L, 0L, 4L, 4L, 5L, 8L, 
8L, 12L, 4L, 6L, 13L, 5L, 12L, 1L, 6L, 4L, 9L, 11L, 11L, 6L, 
10L, 10L, 0L, 3L, 1L, 11L, 4L, 3L, 13L, 5L, 4L, 2L, 3L, 11L, 
7L, 0L, 9L, 6L, 11L, 6L, 13L, 1L, 5L, 0L, 6L, 4L, 8L, 2L, 3L, 
7L, 9L, 12L, 11L, 7L, 4L, 10L, 0L, 6L, 1L, 7L, 2L, 6L, 3L, 1L, 
6L, 10L, 12L, 7L, 7L, 6L, 6L, 1L, 7L, 8L, 7L, 7L, 5L, 7L, 10L, 
10L, 11L, 7L, 1L, 8L, 3L, 12L, 0L, 11L, 8L, 5L, 0L, 6L, 3L, 2L, 
2L, 8L, 9L, 2L, 8L, 2L, 13L, 10L, 2L, 12L, 6L, 13L, 2L, 11L, 
1L, 12L, 6L, 7L, 9L, 8L, 10L, 2L, 6L, 0L, 2L, 11L, 2L, 3L, 9L, 
12L, 1L, 11L, 11L, 12L, 4L, 6L, 9L, 1L, 4L, 1L, 8L, 8L, 6L, 1L, 
9L, 8L, 2L, 10L, 10L, 1L, 2L, 0L, 11L, 6L, 6L, 0L, 4L, 13L, 4L, 
8L, 4L, 10L, 9L, 6L, 11L, 8L, 1L, 6L, 5L, 10L, 8L, 10L, 8L, 0L, 
3L, 0L, 6L, 7L, 4L, 3L, 7L, 7L, 8L, 6L, 2L, 9L, 5L, 7L, 7L, 0L, 
7L, 2L, 5L, 5L, 7L, 5L, 7L, 8L, 6L, 1L, 2L, 6L, 0L, 8L, 10L, 
0L, 10L), x2 = c(4L, 6L, 1L, 5L, 4L, 1L, 8L, 9L, 4L, 7L, 2L, 
6L, 9L, 11L, 5L, 1L, 3L, 2L, 2L, 12L, 8L, 9L, 6L, 4L, 4L, 2L, 
9L, 6L, 6L, 6L, 8L, 0L, 0L, 0L, 8L, 10L, 7L, 7L, 4L, 5L, 5L, 
3L, 6L, 3L, 12L, 6L, 1L, 0L, 8L, 6L, 6L, 7L, 8L, 5L, 8L, 11L, 
3L, 2L, 12L, 11L, 10L, 0L, 2L, 8L, 8L, 3L, 7L, 2L, 7L, 10L, 7L, 
8L, 2L, 4L, 7L, 11L, 1L, 8L, 2L, 5L, 11L, 9L, 7L, 5L, 5L, 3L, 
1L, 8L, 4L, 0L, 5L, 0L, 12L, 5L, 9L, 1L, 2L, 0L, 5L, 0L, 2L, 
10L, 9L, 10L, 0L, 8L, 10L, 0L, 6L, 8L, 8L, 7L, 1L, 6L, 10L, 1L, 
5L, 1L, 6L, 0L, 12L, 7L, 13L, 6L, 9L, 2L, 11L, 10L, 5L, 2L, 0L, 
2L, 5L, 6L, 2L, 10L, 4L, 10L, 4L, 9L, 5L, 9L, 11L, 4L, 3L, 1L, 
6L, 3L, 7L, 7L, 10L, 3L, 3L, 6L, 3L, 7L, 4L, 1L, 0L, 1L, 4L, 
11L, 4L, 10L, 0L, 11L, 0L, 3L, 5L, 11L, 5L, 8L, 10L, 9L, 4L, 
3L, 10L, 4L, 10L, 0L, 3L, 9L, 1L, 7L, 0L, 8L, 1L, 11L, 0L, 5L, 
4L, 2L, 2L, 0L, 11L, 6L, 13L, 9L, 1L, 9L, 7L, 3L, 1L, 12L, 2L, 
2L, 1L, 6L, 4L, 2L, 10L, 6L, 10L, 2L, 3L, 4L, 9L, 2L, 5L, 10L, 
0L, 0L, 10L, 9L, 12L, 0L, 7L, 5L, 10L, 6L, 0L, 9L, 4L, 8L, 1L, 
3L, 5L, 2L, 4L, 12L, 4L, 5L, 2L, 5L, 0L, 2L, 10L, 8L, 10L, 7L, 
3L, 8L, 8L, 6L, 3L, 5L, 6L, 11L, 4L, 5L, 4L, 3L, 10L, 6L, 8L, 
6L, 7L, 4L, 8L, 5L, 3L, 7L, 12L, 8L, 4L, 11L, 2L, 3L, 12L, 1L
), x3 = c(1, 1, 1, 1, 3, 1, 0, 3, 3, 0, 3, 2, 3, 1, 2, 3, 2, 
3, 3, 2, 0, 2, 1, 0, 0, 1, 0, 3, 3, 0, 1, 3, 2, 3, 3, 0, 2, 3, 
0, 2, 0, 3, 2, 3, 2, 3, 0, 2, 2, 1, 2, 0, 2, 0, 3, 1, 2, 1, 3, 
3, 2, 3, 0, 0, 3, 3, 3, 3, 2, 0, 1, 2, 0, 3, 1, 3, 3, 2, 2, 2, 
1, 3, 1, 0, 3, 1, 3, 2, 0, 3, 0, 2, 3, 1, 3, 0, 3, 1, 1, 0, 2, 
0, 2, 1, 1, 2, 3, 3, 1, 2, 0, 0, 2, 3, 0, 0, 1, 2, 2, 3, 3, 2, 
3, 2, 3, 0, 3, 3, 2, 1, 2, 3, 2, 0, 2, 0, 0, 1, 1, 1, 1, 2, 2, 
0, 3, 3, 3, 0, 3, 3, 1, 0, 1, 3, 0, 2, 1, 1, 0, 2, 1, 2, 2, 3, 
2, 1, 1, 1, 0, 1, 1, 1, 2, 1, 2, 2, 2, 2, 2, 3, 3, 1, 3, 3, 3, 
0, 2, 2, 2, 1, 1, 1, 0, 0, 3, 2, 3, 1, 2, 1, 0, 2, 3, 3, 3, 3, 
3, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 3, 2, 0, 0, 1, 1, 2, 1, 3, 
1, 0, 0, 3, 3, 2, 2, 1, 2, 1, 3, 2, 3, 0, 0, 2, 3, 0, 0, 0, 1, 
0, 3, 0, 2, 1, 3, 0, 3, 2, 3, 3, 0, 1, 0, 0, 3, 0, 1, 2, 1, 3, 
2, 1, 3, 3, 0, 0, 1, 0, 3, 2, 1), y = c(0.03688, 0.09105, 0.16246, 
0, 0.11024, 0.16246, 0.13467, 0, 0.11024, 0.0807, 0.12726, 0.03934, 
0, 0.0826, 0.03688, 0.06931, 0.1378, 0.12726, 0.12726, 0.08815, 
0.13467, 0.01314, 0.09105, 0.12077, 0.12077, 0.02821, 0.15134, 
0.03604, 0.03604, 0.08729, 0.04035, 0.46088, 0.20987, 0.46088, 
0.06672, 0.24121, 0.08948, 0.07867, 0.12077, 0.03688, 0.02276, 
0.04535, 0.03934, 0.04535, 0.08815, 0.03604, 0.50771, 0.20987, 
0.08569, 0.09105, 0.03934, 0.0807, 0.08569, 0.02276, 0.06672, 
0.0826, 0.1378, 0.02821, 0.03943, 0.03589, 0.04813, 0.46088, 
0.22346, 0.13467, 0.06672, 0.04535, 0.07867, 0.12726, 0.08948, 
0.24121, 0.06983, 0.08569, 0.22346, 0.11024, 0.06983, 0.03589, 
0.06931, 0.08569, 0.04589, 0.03688, 0.0826, 0, 0.06983, 0.02276, 
0.06238, 0.03192, 0.06931, 0.08569, 0.12077, 0.46088, 0.02276, 
0.20987, 0.03943, 0, 0, 0.50771, 0.12726, 0.1628, 0, 0.41776, 
0.04589, 0.24121, 0.01314, 0.03027, 0.1628, 0.08569, 0, 0.46088, 
0.09105, 0.08569, 0.13467, 0.0807, 0.12912, 0.03604, 0.24121, 
0.50771, 0, 0.12912, 0.03934, 0.46088, 0.03943, 0.08948, 0.07103, 
0.03934, 0, 0.22346, 0.03589, 0, 0.03688, 0.02821, 0.20987, 0.12726, 
0.03688, 0.08729, 0.04589, 0.24121, 0.12077, 0.03027, 0.03688, 
0.03673, 0, 0.01314, 0.02957, 0.12077, 0.04535, 0.06931, 0.03604, 
0.36883, 0.07867, 0.07867, 0.03027, 0.36883, 0.03192, 0.03604, 
0.36883, 0.08948, 0.03688, 0.16246, 0.41776, 0.12912, 0.03688, 
0.02957, 0.1255, 0, 0.20987, 0.0826, 0.1628, 0.03192, 0.02276, 
0.0826, 0, 0.04035, 0.04813, 0.03673, 0.1255, 0.1378, 0.04813, 
0.1255, 0.04813, 0.46088, 0.04535, 0.03673, 0.06931, 0.07867, 
0.46088, 0.13467, 0.12912, 0.02957, 0.20987, 0, 0.03688, 0.02821, 
0.22346, 0.41776, 0.03589, 0.03934, 0.07103, 0.03673, 0.12912, 
0.03673, 0.0807, 0.1378, 0.06931, 0.03943, 0.12726, 0.12726, 
0.06931, 0.08729, 0.12077, 0.02821, 0.03027, 0.08729, 0.03027, 
0.22346, 0.03192, 0.12077, 0.15134, 0.02821, 0.06238, 0.04813, 
0.41776, 0.41776, 0.03027, 0.03673, 0.08815, 0.1628, 0.07867, 
0, 0.24121, 0.08729, 0.46088, 0, 0.1255, 0.08569, 0.16246, 0.1378, 
0, 0.12726, 0.1255, 0.03943, 0.12077, 0.02276, 0.04589, 0.06238, 
0.41776, 0.22346, 0.24121, 0.04035, 0.24121, 0.07867, 0.36883, 
0.08569, 0.04035, 0.03604, 0.36883, 0.06238, 0.03934, 0.03589, 
0.11024, 0.02276, 0.03688, 0.36883, 0.24121, 0.03604, 0.13467, 
0.09105, 0.08948, 0.03688, 0.06672, 0.03688, 0.03192, 0.07867, 
0.03943, 0.13467, 0.12077, 0.0826, 0.22346, 0.04535, 0.08815, 
0.16246)), .Names = c("x1", "x2", "x3", "y"), row.names = c(995L, 
1416L, 281L, 1192L, 1075L, 294L, 1812L, 2235L, 1097L, 1583L, 
670L, 1485L, 2199L, 2495L, 1259L, 436L, 803L, 631L, 617L, 2654L, 
1813L, 2180L, 1403L, 911L, 927L, 533L, 2024L, 1517L, 1522L, 1356L, 
1850L, 222L, 115L, 204L, 1974L, 2292L, 1695L, 1746L, 915L, 1283L, 
1128L, 880L, 1467L, 887L, 2665L, 1532L, 267L, 155L, 1933L, 1447L, 
1488L, 1609L, 1922L, 1168L, 1965L, 2479L, 813L, 550L, 2707L, 
2590L, 2373L, 190L, 504L, 1810L, 2007L, 843L, 1770L, 659L, 1730L, 
2246L, 1668L, 1923L, 465L, 1108L, 1663L, 2616L, 409L, 1946L, 
589L, 1277L, 2493L, 2210L, 1662L, 1142L, 1331L, 735L, 430L, 1916L, 
922L, 208L, 1134L, 127L, 2693L, 1213L, 2236L, 240L, 623L, 108L, 
1190L, 9L, 575L, 2268L, 2171L, 2308L, 103L, 1953L, 2409L, 184L, 
1437L, 1947L, 1847L, 1570L, 365L, 1550L, 2278L, 270L, 1204L, 
384L, 1472L, 205L, 2694L, 1727L, 2800L, 1476L, 2229L, 453L, 2630L, 
2426L, 1275L, 523L, 163L, 635L, 1287L, 1349L, 561L, 2261L, 931L, 
2339L, 973L, 2113L, 1229L, 2155L, 2554L, 936L, 892L, 433L, 1560L, 
697L, 1791L, 1755L, 2351L, 720L, 740L, 1558L, 674L, 1736L, 988L, 
321L, 18L, 375L, 959L, 2560L, 1047L, 2429L, 119L, 2468L, 98L, 
773L, 1158L, 2520L, 1216L, 1872L, 2364L, 2094L, 1035L, 826L, 
2374L, 1028L, 2368L, 176L, 895L, 2090L, 399L, 1789L, 179L, 1800L, 
369L, 2568L, 140L, 1207L, 1001L, 518L, 481L, 12L, 2597L, 1474L, 
2749L, 2097L, 379L, 2110L, 1615L, 800L, 423L, 2733L, 626L, 662L, 
421L, 1363L, 898L, 530L, 2315L, 1365L, 2331L, 468L, 768L, 900L, 
2027L, 544L, 1337L, 2376L, 53L, 44L, 2338L, 2075L, 2655L, 78L, 
1782L, 1231L, 2291L, 1379L, 212L, 2212L, 1032L, 1929L, 331L, 
790L, 1226L, 664L, 1018L, 2735L, 916L, 1157L, 590L, 1343L, 7L, 
490L, 2257L, 1853L, 2251L, 1748L, 719L, 1941L, 1885L, 1544L, 
725L, 1294L, 1494L, 2601L, 1077L, 1169L, 979L, 709L, 2282L, 1526L, 
1797L, 1424L, 1690L, 993L, 1979L, 1268L, 730L, 1739L, 2697L, 
1842L, 952L, 2483L, 479L, 864L, 2677L, 283L), class = "data.frame")

Starting value

starting_value <- structure(c(0.177698291502873, 0.6, 0.0761564106440883, 0.05, 
1.9, 1.1, 0.877181493020499, 1.9), .Names = c("F_initial_x2", 
"F_decay_x2", "S_initial_x2", "S_decay_x2", "initial_x1", "decay_x1", 
"initial_x3", "decay_x3"))

NLSLM fail

coef(nlsLM( 
  formula   = y ~ (F_initial_x2   * exp(- F_decay_x2  * x2) + S_initial_x2 * exp(- S_decay_x2 * x2)) *
    (1 + initial_x1      * exp(- decay_x1      * x1)) *
    (1 + initial_x3      * exp(- decay_x3      * x3 )),
  data     = df,
  start    = coef(brute_force),
  lower    = c(0, 0, 0, 0, 0, 0, 0, 0),
  control  = nls.lm.control(maxiter = 200),
  trace    = T))

It.    0, RSS =    1.36145, Par. =   0.177698        0.6  0.0761564       0.05        1.9        1.1   0.877181        1.9
It.    1, RSS =    1.25401, Par. =   0.207931   0.581039  0.0769047  0.0577244    2.01947    1.22911   0.772957    5.67978
It.    2, RSS =    1.19703, Par. =   0.188978   0.604515  0.0722749  0.0792141    2.44179     1.1258    0.96305    8.67253
It.    3, RSS =     1.1969, Par. =   0.160885   0.640958  0.0990201   0.145187     3.5853   0.847158   0.961844    13.2183
It.    4, RSS =    1.19057, Par. =   0.142138   0.685678    0.11792   0.167417    4.27977   0.936981   0.959606    13.2644
It.    5, RSS =    1.19008, Par. =   0.124264   0.757088   0.136277   0.188896    4.76578    0.91274   0.955142    21.0167
It.    6, RSS =    1.18989, Par. =   0.118904   0.798296   0.141951   0.194167    4.93099    0.91529   0.952972     38.563
It.    7, RSS =    1.18987, Par. =   0.115771   0.821874   0.145398   0.197773    5.02251   0.914204   0.949906     38.563
It.    8, RSS =    1.18986, Par. =   0.113793   0.837804   0.147573   0.199943    5.07456   0.914192   0.948289     38.563
It.    9, RSS =    1.18986, Par. =   0.112458   0.848666   0.149033   0.201406    5.11024   0.914099   0.947232     38.563
It.   10, RSS =    1.18986, Par. =   0.111538   0.856282   0.150035   0.202411    5.13491   0.914051   0.946546     38.563
It.   11, RSS =    1.18986, Par. =   0.110889   0.861702    0.15074   0.203118    5.15244   0.914013   0.946076     38.563
It.   12, RSS =    1.18986, Par. =   0.110426   0.865606   0.151243   0.203623    5.16501   0.913986   0.945747     38.563
It.   13, RSS =    1.18986, Par. =   0.110092   0.868441   0.151605   0.203986    5.17412   0.913966   0.945512     38.563
It.   14, RSS =    1.18986, Par. =   0.109849    0.87051   0.151868    0.20425    5.18075   0.913952   0.945343     38.563
It.   15, RSS =    1.18985, Par. =   0.109672   0.872029    0.15206   0.204443    5.18561   0.913941    0.94522     38.563
It.   16, RSS =    1.18985, Par. =   0.109542   0.873147   0.152201   0.204585    5.18918   0.913933   0.945131     38.563
It.   17, RSS =    1.18985, Par. =   0.109446   0.873971   0.152305   0.204689    5.19181   0.913927   0.945065     38.563
Error in nlsModel(formula, mf, start, wts) : 
  singular gradient matrix at initial parameter estimates

Questions:

  1. Does it make sense to use the best parameters found before the singular gradient matrix problem, ie the one found at Iteration = 17?

  2. If yes is there a way to fetch them? I didn't succeed to save the results when an error occured.

  3. I've noticed that if I reduce the number of maxiter to a number below 17 I still have the same error which appear in the new last iteration, which doesn't make sense to me

Eg with maxiter = 10

It.    0, RSS =    1.36145, Par. =   0.177698        0.6  0.0761564       0.05        1.9        1.1   0.877181        1.9
It.    1, RSS =    1.25401, Par. =   0.207931   0.581039  0.0769047  0.0577244    2.01947    1.22911   0.772957    5.67978
It.    2, RSS =    1.19703, Par. =   0.188978   0.604515  0.0722749  0.0792141    2.44179     1.1258    0.96305    8.67253
It.    3, RSS =     1.1969, Par. =   0.160885   0.640958  0.0990201   0.145187     3.5853   0.847158   0.961844    13.2183
It.    4, RSS =    1.19057, Par. =   0.142138   0.685678    0.11792   0.167417    4.27977   0.936981   0.959606    13.2644
It.    5, RSS =    1.19008, Par. =   0.124264   0.757088   0.136277   0.188896    4.76578    0.91274   0.955142    21.0167
It.    6, RSS =    1.18989, Par. =   0.118904   0.798296   0.141951   0.194167    4.93099    0.91529   0.952972     38.563
It.    7, RSS =    1.18987, Par. =   0.115771   0.821874   0.145398   0.197773    5.02251   0.914204   0.949906     38.563
It.    8, RSS =    1.18986, Par. =   0.113793   0.837804   0.147573   0.199943    5.07456   0.914192   0.948289     38.563
It.    9, RSS =    1.18986, Par. =   0.112458   0.848666   0.149033   0.201406    5.11024   0.914099   0.947232     38.563
It.   10, RSS =    0.12289, Par. =   0.112458   0.848666   0.149033   0.201406    5.11024   0.914099   0.947232     38.563
Error in nlsModel(formula, mf, start, wts) : 
  singular gradient matrix at initial parameter estimates
In addition: Warning message:
In nls.lm(par = start, fn = FCT, jac = jac, control = control, lower = lower,  :
  lmdif: info = -1. Number of iterations has reached `maxiter' == 10.

Do you see any explanation?

Linter answered 15/2, 2016 at 12:12 Comment(0)
S
3

Often when this error occurs, the problem is not the code but the used model. A singular gradient matrix at the initial parameter estimates might indicate that there is not a single unique solution for the model or that the model is overspecified for the data at hand.

To answer your questions:

  1. Yes, that makes sense. The function nlsLM first calls nls.lm which does the iteration. When it reaches the end of the iterations (either because of a best fit or because max.iter), the result is passed on to the function nlsModel. That function does a QR decomposition of the gradient matrix multiplied by the squared weights. And your initial gradient matrix contains a column with only zeros. So while nls.lm can do the iterations, it's only at the next step nlsModel that the problem with the gradient matrix is actually checked and discovered.

  2. There is a way, but that requires you to change the options within R itself, specifically the error option. By setting it to dump.frames, you get a dump of all the environments that exist at the time of error. Those are stored in a list called last.dump and you can use these environments to look for the values you want.

In this case the parameters are returned by a function getPars() that resides inside the environment of the workhorse function nlsModel:

old.opt <- options(error = dump.frames)

themod <- nlsLM( 
  formula   = y ~ (F_initial_x2   * exp(- F_decay_x2  * x2) + 
                     S_initial_x2 * exp(- S_decay_x2 * x2)) *
    (1 + initial_x1      * exp(- decay_x1      * x1)) *
    (1 + initial_x3      * exp(- decay_x3      * x3 )),
  data     = df,
  start    = starting_value,
  lower    = c(0, 0, 0, 0, 0, 0, 0, 0),
  control  = nls.lm.control(maxiter = 200),
  trace    = TRUE)

thecoefs <- llast.dump[["nlsModel(formula, mf, start, wts)"]]$getPars()
options(old.opt) # reset to the previous value.

Note that this is NOT the kind of code you want to use in a production environment or to share with colleagues. And it's also not a solution to your problem, because the problem is the model, not the code.

  1. This is another consequence of what I explained in 1. So yes, that's logic.

I've done a very brief test to see if it really is the model, and if I replace the last parameter (decay_x3) by its start value, the model is fitted without problem. I don't know what we're dealing with here, so dropping another parameter might make more sense in the real world, but just to prove that your code is fine:

themod <- nlsLM( 
  formula   = y ~ (F_initial_x2   * exp(- F_decay_x2  * x2) + 
                     S_initial_x2 * exp(- S_decay_x2 * x2)) *
    (1 + initial_x1      * exp(- decay_x1      * x1)) *
    (1 + initial_x3      * exp(- 1.9* x3 )),
  data     = df,
  start    = starting_value[-8],
  lower    = c(0, 0, 0, 0, 0, 0, 0, 0)[-8],
  control  = nls.lm.control(maxiter = 200),
  trace    = TRUE)

exits without errors at iteration 10.


EDIT: I've been looking a bit deeper into it, and based on the data the "extra" solution is basically to kick x3 out of the model. You only have 3 unique values in there, and the initial estimate for the parameter is about 38. So:

> exp(-38*c(1,2,3)) < .Machine$double.eps
[1] TRUE TRUE TRUE

If you compare that to the actual Y values, it's clear that initial_x3 * exp(- decay_x3 * x3 ) doesn't contribute anything to the model, as it is practically 0.

If you manually calculate the gradient as done in nlsModel, you get a matrix that's not of full rank; the last column contains only 0 :

theenv <- list2env( c(df, thecoefs))
thederiv <- numericDeriv(form[[3]], names(starting_value), theenv)
thegrad <- attr(thederiv, "gradient")

And that's what gives you the error. The model is overspecified for the data you have.

The log-transformation that Gabor suggests, prevents that your last estimate becomes so big it forces x3 out of the model. Due to the log transformation, the algorithm doesn't jump to such extreme values very easily. In order to have the same estimates as with the original model, his decay_x3 should be as high as 3.2e16 to specify the same model (exp(38)). So the log transformation protects you from estimates that force the influence of any variable to 0.

Another side effect of the log transformation is that big steps in the value of decay_x3 have only a moderate effect on the model. The estimate Gabor finds, is already a whopping 1.3e7, but after the back transformation that's still a doable value of 16 for decay_x3. Which still makes x3 redundant in the model if you look at :

> exp(-16*c(1,2,3))
[1] 1.125352e-07 1.266417e-14 1.425164e-21

But it doesn't cause the singularity that causes your error.

You can avoid this by setting your upper boundaries, eg:

themod <- nlsLM( 
  formula   = y ~ (F_initial_x2   * exp(- F_decay_x2  * x2) + 
                     S_initial_x2 * exp(- S_decay_x2 * x2)) *
    (1 + initial_x1      * exp(- decay_x1      * x1)) *
    (1 + initial_x3      * exp(- decay_x3      * x3 )),
  data     = df,
  start    = starting_value,
  lower    = c(0, 0, 0, 0, 0, 0, 0, 0),
  upper    = rep(- log(.Machine$double.eps^0.5),8),
  control  = nls.lm.control(maxiter = 200),
  trace    = TRUE)

runs perfectly fine, gives you the same estimates, and again concludes that x3 is redundant.

So whatever way you look at it, x3 has no impact on y, your model is overspecified and can't be fit decently with the data at hand.

Supplementary answered 18/2, 2016 at 11:8 Comment(2)
in this case do you see why there is not a unique solution? Don't you think that's weird that a simple log transformation (see G. Grothendieck proposal) makes nlslm found a single solution? If there is not a single/unique solution, is there a way to make nlslm select randomly a solution?Linter
@psql I've been a bit quick. I meant not a unique solution with the data at hand. In your case x3 is redundant in the model, as shown by both my outcomes and the outcome of Gabor's model - see also my edits. You can't expect any method to give you a solution that makes sense in such a situation.Supplementary
M
4

The underlying problem in the question is that convergence is not being achieved. This can be resolved by transforming the decay parameters using Y = log(X+1) and then transforming them back afterwards using X = exp(Y)-1. Such transformations can beneficially modify the jacobian. Unfortunately, the application of such transformations tends to be largely trial and error. (Also see Note 1.)

ix <- grep("decay", names(starting_value))
fm <- nlsLM( 
   formula   = y ~ (F_initial_x2   * exp(- log(F_decay_x2+1)  * x2) + 
                    S_initial_x2 * exp(- log(S_decay_x2+1) * x2)) *
                    (1 + initial_x1 * exp(- log(decay_x1+1) * x1)) *
                    (1 + initial_x3 * exp(- log(decay_x3+1) * x3 )),
   data     = df,
   start    = replace(starting_value, ix, exp(starting_value[ix]) - 1),
   lower    = c(0, 0, 0, 0, 0, 0, 0, 0),
   control  = nls.lm.control(maxiter = 200),
   trace    = TRUE)

giving a similar residual sum of squares but achieving convergence:

> fm
Nonlinear regression model
  model: y ~ (F_initial_x2 * exp(-log(F_decay_x2 + 1) * x2) + S_initial_x2 *     exp(-log(S_decay_x2 + 1) * x2)) * (1 + initial_x1 * exp(-log(decay_x1 +     1) * x1)) * (1 + initial_x3 * exp(-log(decay_x3 + 1) * x3))
   data: df
F_initial_x2   F_decay_x2 S_initial_x2   S_decay_x2   initial_x1     decay_x1 
   1.092e-01    1.402e+00    1.526e-01    2.275e-01    5.199e+00    1.494e+00 
  initial_x3     decay_x3 
   9.449e-01    1.375e+07 
 residual sum-of-squares: 1.19

Number of iterations to convergence: 38 
Achieved convergence tolerance: 1.49e-08

> replace(coef(fm), ix, log(coef(fm)[ix]+1))
F_initial_x2   F_decay_x2 S_initial_x2   S_decay_x2   initial_x1     decay_x1 
   0.1091735    0.8763253    0.1525997    0.2049852    5.1993194    0.9139096 
  initial_x3     decay_x3 
   0.9448779   16.4368001 

Note 1: After some experimentation I noticed that it was good enough to just apply the transformation on decay_x3.

Note 2: Regarding the comment that you would like something automatic note that a third degree polynomial fit with lm would more consistently not run into trouble and has lower residual sum of squares -- 1.14 vs. 1.19 -- but at the expense of more parameters -- 10 vs. 8.

# lm poly fit
fm.poly <- lm(y ~ poly(x1, x2, degree = 3), df)
deviance(fm.poly) # residual sum of squares
## [1] 1.141398
length(coef(fm.poly)) # no. of coefficients
## [1] 10

# nlsLM fit transforming decay parameters
deviance(fm)
## [1] 1.189855
length(coef(fm))
## [1] 8

Note 3: Here is another model formed by replacing the x3 part with a quadratic polynomial and dropping F_initial_x2 as it becomes redundant. It also has 8 parameters, it converges and it fits the data better than the model in the question (i.e. has lower residual sum of squares).

fm3 <- nlsLM(formula   = y ~ (exp(- F_decay_x2  * x2) + 
              S_initial_x2 * exp(- S_decay_x2 * x2)) *
              (1 + initial_x1 * exp(- decay_x1      * x1)) *
              cbind(1, poly(x3, degree = 2)) %*% c(p1,p2,p3),
          data     = df,
          start    = c(starting_value[-c(1, 7:8)], p1=0, p2=0, p3=0),
          lower = c(0, 0, 0, 0, 0, 0, NA, NA),
          control  = nls.lm.control(maxiter = 200),
          trace    = TRUE)

giving:

> fm3
Nonlinear regression model
  model: y ~ (exp(-F_decay_x2 * x2) + S_initial_x2 * exp(-S_decay_x2 *     x2)) * (1 + initial_x1 * exp(-decay_x1 * x1)) * cbind(1,     poly(x3, degree = 2)) %*% c(p1, p2, p3)
   data: df
  F_decay_x2 S_initial_x2   S_decay_x2   initial_x1     decay_x1           p1 
     3.51614      2.60886      0.26304      8.26244      0.81232      0.09031 
          p2           p3 
    -0.16968      0.53324 
 residual sum-of-squares: 1.019

Number of iterations to convergence: 20 
Achieved convergence tolerance: 1.49e-08

Note 4: nlxb from the nlmrt package converges without doing anything special.

library(nlmrt)
nlxb( 
  formula   = y ~ (F_initial_x2   * exp(- F_decay_x2  * x2) + S_initial_x2 * exp(- S_decay_x2 * x2)) *
    (1 + initial_x1      * exp(- decay_x1      * x1)) *
    (1 + initial_x3      * exp(- decay_x3      * x3 )),
  data     = df,
  start    = starting_value,
  lower    = c(0, 0, 0, 0, 0, 0, 0, 0),
  control  = nls.lm.control(maxiter = 200),
  trace    = TRUE)

giving:

residual sumsquares =  1.1899  on  280 observations
    after  31    Jacobian and  33 function evaluations
  name            coeff          SE       tstat      pval      gradient    JSingval   
F_initial_x2      0.109175            NA         NA         NA   3.372e-11        15.1  
F_decay_x2      0.876313            NA         NA         NA   -5.94e-12       8.083  
S_initial_x2      0.152598            NA         NA         NA    6.55e-11       2.163  
S_decay_x2      0.204984            NA         NA         NA   4.206e-11      0.6181  
initial_x1       5.19928            NA         NA         NA  -1.191e-12      0.3601  
decay_x1         0.91391            NA         NA         NA   6.662e-13      0.1315  
initial_x3      0.944879            NA         NA         NA   2.736e-12     0.02247  
decay_x3         33.9921            NA         NA         NA  -1.056e-15   2.928e-15  
Massage answered 19/2, 2016 at 22:41 Comment(4)
Could you explain what's the point of this log transformation?Linter
I'd like to understand why it makes a difference, so I can adapt this trick to other problemsLinter
The problem is I need to fit automatically a lot of non linear models so I can't try different transformation each time. I would need a robust way to find a decent fit, even if the model doesn't fit the data perfectly like in this example.Linter
Thanks for spotting this package. Do you know why it's succeed to find a solution without any change? I didn't understand the difference in the package doc. It just says it's more aggresive and reduce the risk of singular gradient error.Linter
S
3

Often when this error occurs, the problem is not the code but the used model. A singular gradient matrix at the initial parameter estimates might indicate that there is not a single unique solution for the model or that the model is overspecified for the data at hand.

To answer your questions:

  1. Yes, that makes sense. The function nlsLM first calls nls.lm which does the iteration. When it reaches the end of the iterations (either because of a best fit or because max.iter), the result is passed on to the function nlsModel. That function does a QR decomposition of the gradient matrix multiplied by the squared weights. And your initial gradient matrix contains a column with only zeros. So while nls.lm can do the iterations, it's only at the next step nlsModel that the problem with the gradient matrix is actually checked and discovered.

  2. There is a way, but that requires you to change the options within R itself, specifically the error option. By setting it to dump.frames, you get a dump of all the environments that exist at the time of error. Those are stored in a list called last.dump and you can use these environments to look for the values you want.

In this case the parameters are returned by a function getPars() that resides inside the environment of the workhorse function nlsModel:

old.opt <- options(error = dump.frames)

themod <- nlsLM( 
  formula   = y ~ (F_initial_x2   * exp(- F_decay_x2  * x2) + 
                     S_initial_x2 * exp(- S_decay_x2 * x2)) *
    (1 + initial_x1      * exp(- decay_x1      * x1)) *
    (1 + initial_x3      * exp(- decay_x3      * x3 )),
  data     = df,
  start    = starting_value,
  lower    = c(0, 0, 0, 0, 0, 0, 0, 0),
  control  = nls.lm.control(maxiter = 200),
  trace    = TRUE)

thecoefs <- llast.dump[["nlsModel(formula, mf, start, wts)"]]$getPars()
options(old.opt) # reset to the previous value.

Note that this is NOT the kind of code you want to use in a production environment or to share with colleagues. And it's also not a solution to your problem, because the problem is the model, not the code.

  1. This is another consequence of what I explained in 1. So yes, that's logic.

I've done a very brief test to see if it really is the model, and if I replace the last parameter (decay_x3) by its start value, the model is fitted without problem. I don't know what we're dealing with here, so dropping another parameter might make more sense in the real world, but just to prove that your code is fine:

themod <- nlsLM( 
  formula   = y ~ (F_initial_x2   * exp(- F_decay_x2  * x2) + 
                     S_initial_x2 * exp(- S_decay_x2 * x2)) *
    (1 + initial_x1      * exp(- decay_x1      * x1)) *
    (1 + initial_x3      * exp(- 1.9* x3 )),
  data     = df,
  start    = starting_value[-8],
  lower    = c(0, 0, 0, 0, 0, 0, 0, 0)[-8],
  control  = nls.lm.control(maxiter = 200),
  trace    = TRUE)

exits without errors at iteration 10.


EDIT: I've been looking a bit deeper into it, and based on the data the "extra" solution is basically to kick x3 out of the model. You only have 3 unique values in there, and the initial estimate for the parameter is about 38. So:

> exp(-38*c(1,2,3)) < .Machine$double.eps
[1] TRUE TRUE TRUE

If you compare that to the actual Y values, it's clear that initial_x3 * exp(- decay_x3 * x3 ) doesn't contribute anything to the model, as it is practically 0.

If you manually calculate the gradient as done in nlsModel, you get a matrix that's not of full rank; the last column contains only 0 :

theenv <- list2env( c(df, thecoefs))
thederiv <- numericDeriv(form[[3]], names(starting_value), theenv)
thegrad <- attr(thederiv, "gradient")

And that's what gives you the error. The model is overspecified for the data you have.

The log-transformation that Gabor suggests, prevents that your last estimate becomes so big it forces x3 out of the model. Due to the log transformation, the algorithm doesn't jump to such extreme values very easily. In order to have the same estimates as with the original model, his decay_x3 should be as high as 3.2e16 to specify the same model (exp(38)). So the log transformation protects you from estimates that force the influence of any variable to 0.

Another side effect of the log transformation is that big steps in the value of decay_x3 have only a moderate effect on the model. The estimate Gabor finds, is already a whopping 1.3e7, but after the back transformation that's still a doable value of 16 for decay_x3. Which still makes x3 redundant in the model if you look at :

> exp(-16*c(1,2,3))
[1] 1.125352e-07 1.266417e-14 1.425164e-21

But it doesn't cause the singularity that causes your error.

You can avoid this by setting your upper boundaries, eg:

themod <- nlsLM( 
  formula   = y ~ (F_initial_x2   * exp(- F_decay_x2  * x2) + 
                     S_initial_x2 * exp(- S_decay_x2 * x2)) *
    (1 + initial_x1      * exp(- decay_x1      * x1)) *
    (1 + initial_x3      * exp(- decay_x3      * x3 )),
  data     = df,
  start    = starting_value,
  lower    = c(0, 0, 0, 0, 0, 0, 0, 0),
  upper    = rep(- log(.Machine$double.eps^0.5),8),
  control  = nls.lm.control(maxiter = 200),
  trace    = TRUE)

runs perfectly fine, gives you the same estimates, and again concludes that x3 is redundant.

So whatever way you look at it, x3 has no impact on y, your model is overspecified and can't be fit decently with the data at hand.

Supplementary answered 18/2, 2016 at 11:8 Comment(2)
in this case do you see why there is not a unique solution? Don't you think that's weird that a simple log transformation (see G. Grothendieck proposal) makes nlslm found a single solution? If there is not a single/unique solution, is there a way to make nlslm select randomly a solution?Linter
@psql I've been a bit quick. I meant not a unique solution with the data at hand. In your case x3 is redundant in the model, as shown by both my outcomes and the outcome of Gabor's model - see also my edits. You can't expect any method to give you a solution that makes sense in such a situation.Supplementary

© 2022 - 2024 — McMap. All rights reserved.