Calculating R^2 for a nonlinear least squares fit
Asked Answered
B

6

21

Suppose I have x values, y values, and expected y values f (from some nonlinear best fit curve).

How can I compute coefficient of determination (R2)? Note that this function is not a linear model, but a nonlinear least squares (nls) fit, so not an lm fit.

Brion answered 25/1, 2013 at 21:35 Comment(7)
It is hard for me to see if you are talking about a linear system of some sort or not. Title says nonlinear and f is "some best fit line"? Maybe a clarification could get you a better answer.Cornia
@Seth, I edited the title based on the text of the question and the comments. I think the OP means "best fit curve", and I suspect that your answer is exactly what they want. Of course the OP is free to roll back my edit if it's wrong, or further edit for clarity ...Kerenkeresan
@BenBolker your edit is correct. Sorry about that.Brion
This sounds more like a statistics question than an R question.Particulate
"how can I ... ?" is an R question. "Should I?" is a statistics question.Kerenkeresan
What about applying the inverse function f(-1) on y, i.e. f(-1)( y ) = x* and then make a linear regression between x and x*. Shouldn't that be the generalization of r square to a non-linear function?Pagas
I added an answer with various links to packages that calculate pseudo R2 values for nls models - I think that answer should be flagged as the correct one, as the currently flagged one merely argues against the use of R2 in general...Fruma
K
21

You just use the lm function to fit a linear model:

x = runif(100)
y = runif(100)
spam = summary(lm(x~y))
> spam$r.squared
[1] 0.0008532386

Note that the r squared is not defined for non-linear models, or at least very tricky, quote from R-help:

There is a good reason that an nls model fit in R does not provide r-squared - r-squared doesn't make sense for a general nls model.

One way of thinking of r-squared is as a comparison of the residual sum of squares for the fitted model to the residual sum of squares for a trivial model that consists of a constant only. You cannot guarantee that this is a comparison of nested models when dealing with an nls model. If the models aren't nested this comparison is not terribly meaningful.

So the answer is that you probably don't want to do this in the first place.

If you want peer-reviewed evidence, see this article for example; it's not that you can't compute the R^2 value, it's just that it may not mean the same thing/have the same desirable properties as in the linear-model case.

Knuth answered 25/1, 2013 at 21:37 Comment(6)
True, except this isn't a linear model :)Brion
Sure there is. You can generalize R^2 to nonlinear in some way. How else could one measure the fit?Brion
The R^2 in the way it is defined for linear models does not have the same meaning for non-linear models, see my answer.Knuth
@flodel, that's the article linked from the answer.Kerenkeresan
I think your answer could be improved if you used my deleted solution for the first part. The OP is not interested in a linear model, instead he has his own prediction f. Feel free to use it.Tecu
@BenBolker Added a link to various packages reporting pseudo R2 values for nls models in my answer below - do not entirely agree that these can't be useful or can't provide an intuitive property of the fit...Fruma
C
10

Sounds like f are your predicted values. So the distance from them to the actual values devided by n * variance of y

so something like

1-sum((y-f)^2)/(length(y)*var(y))

should give you a quasi rsquared value, so long as your model is reasonably close to a linear model and n is pretty big.

Cornia answered 25/1, 2013 at 21:43 Comment(2)
note that this is the same as @Flodel's (now deleted) answer, with the tiny exception that var(y) is computed with a denominator of length(y)-1 (unbiased estimate), so the denominator is not quite the same as sum((y-mean(y))^2). I agree with all the cautionary notes here. I also think that if someone asks for rope to hang themselves, it's fine to give it to them (while performing due diligence by asking "are you sure you want to do this?")Kerenkeresan
@Paul but most of us would miss.Residuum
F
8

As a direct answer to the question asked (rather than argue that R2/pseudo R2 aren't useful) the nagelkerke function in the rcompanion package will report various pseudo R2 values for nonlinear least square (nls) models as proposed by McFadden, Cox and Snell, and Nagelkerke, e.g.

require(nls)
data(BrendonSmall)
quadplat = function(x, a, b, clx) {
          ifelse(x  < clx, a + b * x   + (-0.5*b/clx) * x   * x,
                           a + b * clx + (-0.5*b/clx) * clx * clx)}
model = nls(Sodium ~ quadplat(Calories, a, b, clx),
            data = BrendonSmall,
            start = list(a   = 519,
                         b   = 0.359,
                         clx = 2304))
nullfunct = function(x, m){m}
null.model = nls(Sodium ~ nullfunct(Calories, m),
             data = BrendonSmall,
             start = list(m   = 1346))
nagelkerke(model, null=null.model)

The soilphysics package also reports Efron's pseudo R2 and adjusted pseudo R2 value for nls models as 1 - RSS/TSS:

pred <- predict(model)
n <- length(pred)
res <- resid(model)
w <- weights(model)
if (is.null(w)) w <- rep(1, n)
rss <- sum(w * res ^ 2)
resp <- pred + res
center <- weighted.mean(resp, w)
r.df <- summary(model)$df[2]
int.df <- 1
tss <- sum(w * (resp - center)^2)
r.sq <- 1 - rss/tss
adj.r.sq <- 1 - (1 - r.sq) * (n - int.df) / r.df
out <- list(pseudo.R.squared = r.sq,
            adj.R.squared = adj.r.sq)

which is also the pseudo R2 as calculated by the accuracy function in the rcompanion package. Basically, this R2 measures how much better your fit becomes compared to if you would just draw a flat horizontal line through them. This can make sense for nls models if your null model is one that allows for an intercept only model. Also for particular other nonlinear models it can make sense. E.g. for a scam model that uses stricly increasing splines (bs="mpi" in the spline term), the fitted model for the worst possible scenario (e.g. where your data was strictly decreasing) would be a flat line, and hence would result in an R2 of zero. Adjusted R2 then also penalize models with higher nrs of fitted parameters. Using the adjusted R2 value would already address a lot of the criticisms of the paper linked above, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892436/ (besides if one swears by using information criteria to do model selection the question becomes which one to use - AIC, BIC, EBIC, AICc, QIC, etc).

Just using

r.sq <- max(cor(y,yfitted),0)^2
adj.r.sq <- 1 - (1 - r.sq) * (n - int.df) / r.df

I think would also make sense if you have normal Gaussian errors - i.e. the correlation between the observed and fitted y (clipped at zero, so that a negative relationship would imply zero predictive power) squared, and then adjusted for the nr of fitted parameters in the adjusted version. If y and yfitted go in the same direction this would be the R2 and adjusted R2 value as reported for a regular linear model. To me this would make perfect sense at least, so I don't agree with outright rejecting the usefulness of pseudo R2 values for nls models as the answer above seems to imply.

For non-normal error structures (e.g. if you were using a GAM with non-normal errors) the McFadden pseudo R2 is defined analogously as

1-residual deviance/null deviance

See here and here for some useful discussion.

Fruma answered 14/3, 2018 at 14:56 Comment(0)
B
5

Another quasi-R-squared for non-linear models is to square the correlation between the actual y-values and the predicted y-values. For linear models this is the regular R-squared.

Bodi answered 25/1, 2013 at 23:38 Comment(6)
is this actually numerically different from @Seth's/@Flodel's answer? I think they might come out identical, at least up to factors of n-1 vs n (but too lazy to check)Kerenkeresan
@BenBolker, Mine divides by sd(y-hat) and sd(y) where Seth's answer divides by sd(y) twice (var(y)), so when the variability of y-hat and y is similar (high r-squared) they should be pretty similar, but would probably tend to differ more. If for some strange reason the fitted model was opposite the true relationship (model says y increases with x when truth is that it decreases with x) then Seth's answer would give a negative R-squared while mine would give a value closer to 1, which is the more useless in this strange situation is another debate.Bodi
To avoid the pathological example you describe would it not make more sense then to use max(cor(y,yfitted),0)^2 as your pseudo R2? So that it would come out at 0 if the relationship went in the wrong direction?Fruma
@TomWenseleers, I think that would hide the issue rather that aiding in understanding. Not using R^2 or at least understanding that it can give silly results is probably more helpful than just forcing it to fit a specific range. See fortunes 233, 252, 253, and 254.Bodi
Well R2 just measures how much better you can predict your y values compared to just drawing a flat line through them (which can be OK also for nls if your nls model allows for an intercept only model). But your predictions of course also have to go in the right direction, so it would seem to me that you can simply take this into account in the way I mentioned, without altering the interpretation really...Fruma
And clipping the R2 in Seth's answer at zero would have the same effect, and again would seem a logical thing to do...Fruma
K
1

As an alternative to this problem I used at several times the following procedure:

  1. compute a fit on data with the nls function
  2. using the resulting model make predictions
  3. Trace (plot...) the data against the values predicted by the model (if the model is good, points should be near the bissectrix).
  4. Compute the R2 of the linear régression.

Best wishes to all. Patrick.

Krantz answered 10/1, 2018 at 18:18 Comment(0)
S
1

With the modelr package

modelr::rsquare(nls_model, data)

nls_model <- nls(mpg ~ a /  wt + b, data = mtcars, start = list(a = 40, b = 4))

modelr::rsquare(nls_model, mtcars)
# 0.794

This gives essentially the same result as the longer way described by Tom from the rcompanion resource.

Longer way with nagelkerke function

nullfunct <- function(x, m){m}
null_model <- nls(mpg ~ nullfunct(wt, m),
                 data = mtcars,
                 start = list(m = mean(mtcars$mpg)))

nagelkerke(nls_model, null_model)[2]
# 0.794 or 0.796

Lastly, using predicted values

lm(mpg ~ predict(nls_model), data = mtcars) %>% broom::glance()
# 0.795

Like they say, it's only an approximation.

Sloth answered 30/7, 2021 at 0:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.