Singular gradient error during bootstrapped nls fit to bad data
Asked Answered
W

1

9

I have a dataset containing an independent variable and a set of dependent variables. I'd like to fit a function to each set of independent variables, using a bootstrapped nonlinear least squares procedure. In some cases, the independent variables are 'good quality', i.e. fit the function reasonably well. In other cases, they're noisy.

In all cases, I can use nls() to get an estimate of the parameters. However, when the data are noisy, a bootstrap throws the error Error in nls(...) : singular gradient. I can understand why nls fitting to noisy data would fail, e.g. by failing to converge after too many iterations, but I don't understand why it is a singular gradient error, and why I only get it resampled datasets of poor quality.

Code:

require(ggplot2)
require(plyr)
require(boot)

# Data are in long form: columns are 'enzyme', 'x', and 'y'
enz <- read.table("http://dl.dropbox.com/s/ts3ruh91kpr47sj/SE.txt", header=TRUE)

# Nonlinear formula to fit to data
mmFormula <- formula(y ~ (x*Vmax) / (x + Km))

nls is perfectly capable of fitting the data (even if in some cases, like a, I doubt the model fits the data.

# Use nls to fit mmFormula to the data - this works well enough
fitDf <- ddply(enz, .(enzyme), function(x) coefficients(nls(mmFormula, x, start=list(Km=100, Vmax=0.5))))

# Create points to plot for the simulated fits
xGrid <- 0:200
simFits <- dlply(fitDf, .(enzyme), function(x) data.frame(x=xGrid, y=(xGrid * x$Vmax)/(xGrid + x$Km)))
simFits <- ldply(simFits, identity) 

ggplot() + geom_point(data=enz, aes(x=x, y=y)) + geom_line(data=simFits, aes(x=x, y=y)) + 
  facet_wrap(~enzyme, scales="free_y") + aes(ymin=0)

NLS fits of mmFormula to data

Bootstrapping works fine for the good quality data:

# Function to pass to bootstrap; returns coefficients of nls fit to formula
nlsCoef <- function(df, i) {
  KmGuess <- median(df$x)
  VmaxGuess <- max(df$y)
  dfSamp <- df[i,]
  nlsCoef <- coefficients(nls(mmFormula, dfSamp, start=list(Km=100, Vmax=0.5)))
}

eBoot <- boot(subset(enz, enzyme=="e"), nlsCoef, R=1000) #No error

But not for the poor quality data

dBoot <- boot(subset(enz, enzyme=="d"), nlsCoef, R=10)
> Error in nls(mmFormula, dfSamp, start = list(Km = KmGuess, Vmax = VmaxGuess)) : 
   singular gradient

What is causing this error? And what should I do about it, given that I'd like to use plyr to simultaneously perform lots of bootstrap simulations?

Weld answered 23/10, 2012 at 14:35 Comment(8)
I would avoid fitting Michaelis-Menten with only three distinct concentration values. However, maybe you could improve the guess for the starting values (in particular KmGuess) by first fitting Lineweaver-Burk using lm.Fasciate
Yeah, I realize the experimental scheme was less-than-optimal. Live and learn. Using Lineweaver-Burke for a starting guess is a good idea. However, I don't think the starting guess is the problem, because a.) the nls fits (without bootstrapping) work fine with relatively bad starting guesses, e.g. Km=100, Vmax=0.5; b.) when I change the bootstrap function to those same starting guesses I get the same error, and c.) I think bad starting guesses usually cause a failure-to-converge error rather than a singular gradient error.Weld
Well, you have some data that doesn't follow the model at all. Sometimes I have been able to solve similar problems (even singular gradient errors) by using different starting values (nls2 can help with that). A different optimization algorithm might also help. But if the data strongly violates the model, it is impossible to fit and that could happen during the bootstrap.Fasciate
But that's the thing I don't get - all of the data can be fit by the model. It is only the resampled data that can't be fit by the model.Weld
Some bootstrap samples are probably impossible to fit. You could rewrite your bootstrap to exclude those. If the number of excluded samples is small, the result might be still valid. But my advise is not to do the bootstrap on this data.Fasciate
Fair enough. The reason that I'd like to do them all is that I want to subsequently develop a criterion for which fits to reject. Also it bothers me that I don't understand the error.Weld
let us continue this discussion in chatWeld
Perhaps you could bootstrap on the residuals to better preserve the x distribution?Kommunarsk
F
3

This allows you to examine what happens:

#modified function
#returns NAs if fit is not sucessfull
#does global assignment to store bootstrap permutations
nlsCoef <- function(df, i) {
  KmGuess <- median(df$x)
  VmaxGuess <- max(df$y)
  dfSamp <- df[i,]
  fit <- NULL
  try(fit <- nls(mmFormula, dfSamp, start=list(Km=100, Vmax=0.5)))
  if(!is.null(fit)){
    res <- coef(fit)
  } else{
    res <- c(Km=NA,Vmax=NA)
  }

  istore[k,] <<- i
  k <<- k+1
  res
}

n <- 100
istore <- matrix(nrow=n+1,ncol=9)
k <- 1

dat <- subset(enz, enzyme=="d")
dBoot <- boot(dat, nlsCoef, R=n) 

#permutations that create samples that cannot be fitted
nais <- istore[-1,][is.na(dBoot$t[,1]),]

#look at first bootstrap sample 
#that could not be fitted
samp <- dat[nais[1,],]
plot(y~x,data=samp)
fit <- nls(mmFormula, samp, start=list(Km=100, Vmax=0.5))
#error

You can also use a self-starting model:

try(fit <- nls(y ~ SSmicmen(x, Vmax, Km), data = dfSamp))

With that error messages become a bit more informative. E.g., one error is

too few distinct input values to fit a Michaelis-Menten model

which means, that some bootstrap samples contain less then three distinct concentrations. But there are also some other errors:

step factor 0.000488281 reduced below 'minFactor' of 0.000976562

which you might be able to avoid by decreasing minFactor.

The following are nasty. You could try different fitting algorithms or starting values with those:

singular gradient matrix at initial parameter estimates

singular gradient
Fasciate answered 24/10, 2012 at 9:57 Comment(1)
PS: Avoid using subset inside functions. Its help page warns specifically against that. Use [ instead.Fasciate

© 2022 - 2024 — McMap. All rights reserved.