trying to display original and fitted data (nls + dnorm) with ggplot2's geom_smooth()
Asked Answered
B

1

13

I am exploring some data, so the first thing I wanted to do was try to fit a normal (Gaussian) distribution to it. This is my first time trying this in R, so I'm taking it one step at a time. First I pre-binned my data:

myhist = data.frame(size = 10:27, counts = c(1L, 3L, 5L, 6L, 9L, 14L, 13L, 23L, 31L, 40L, 42L, 22L, 14L, 7L, 4L, 2L, 2L, 1L) )

qplot(x=size, y=counts, data=myhist)

plot1

Since I want counts, I need to add a normalization factor (N) to scale up the density:

fit = nls(counts ~ N * dnorm(size, m, s), data=myhist, start=c(m=20, s=5, N=sum(myhist$counts)) )   

Then I create the fitted data for display and everything works great:

x = seq(10,30,0.2)
fitted = data.frame(size = x, counts=predict(fit, data.frame(size=x)) )
ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() + geom_line(data=fitted)

plot2

I got excited when I found this thread which talks about using geom_smooth() to do it all in one step, but I can't get it to work:

Here's what I try... and what I get:

ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() + geom_smooth(method="nls", formula = counts ~ N * dnorm(size, m, s), se=F, start=list(m=20, s=5, N=300, size=10))

Error in method(formula, data = data, weights = weight, ...) : 
  parameters without starting value in 'data': counts

The error seems to indicate that it's trying to fit for the observed variable, counts, but that doesn't make any sense, and it predictably freaks out if I specify a "starting" value for counts too:

fitting parameters ‘m’, ‘s’, ‘N’, ‘size’, ‘counts’ without any variables

Error in eval(expr, envir, enclos) : object 'counts' not found

Any idea what I'm doing wrong? It's not the end of the world, of course, but fewer steps are always better, and you guys always come up with the most elegant solutions to these common tasks.

Thanks in advance!

Jeffrey

Banta answered 7/12, 2010 at 22:3 Comment(2)
You mention that you're exploring the data. Is it necessary that the fit be nls? Using a simple ggplot(myhist, aes(x = size, y = counts)) + geom_point() + geom_smooth() gets you a loess fit (I believe) with less fuss. I do hope somebody will explain how to get nls to work, though... rather mystifying.Holbein
I come (reluctantly) to statistics from the natural sciences, so "explore" tends to mean "time to fit Gaussians!". Seriously, though, I have found some great resources like Ricci's "Fitting Distributions in R" (cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf) and this SO answer on fitdistr() (#4290581), but none of the older stuff uses ggplot2.Banta
H
17

the first error indicates that ggplot2 cannot find the variable 'count', which is used in formula, in data.

Stats take place after mapping, that is, size -> x, and counts -> y.

Here is an example for using nls in geom_smooth:

ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() + 
  geom_smooth(method="nls", formula = y ~ N * dnorm(x, m, s), se=F, 
              start=list(m=20, s=5, N=300)) 

The point is that using x and y, instead of size and counts, in the specification of formula.

Haman answered 7/12, 2010 at 23:55 Comment(2)
Thanks, kohske -- that explains it!Banta
Great answer! It looks like geom_smooth's syntax has changed a little since 2010. The equivalent code now (2022) would be: geom_smooth(method="nls", formula = y ~ N * dnorm(x, m, s), se=F, method.args = list(start=list(m=20, s=5, N=300))).Crandall

© 2022 - 2024 — McMap. All rights reserved.