Method to extract stat_smooth line fit
Asked Answered
T

5

67

Is there a way to extract the values of the fitted line returned from stat_smooth?

The code I am using looks like this:

p <- ggplot(df1, aes(x=Days, y= Qty,group=Category,color=Category))
p <- p + stat_smooth(method=glm, fullrange=TRUE)+ geom_point())

This new r user would greatly appreciate any guidance.

Tiertza answered 20/3, 2012 at 15:35 Comment(4)
You example code isn't reproducible, can you make it so?Cann
Yes I know, and I will make it so as soon as my R stops saying "Not Responding". I just thought someone might know of the top of their head w/out reproducible code.Tiertza
I believe the answer is no, as AFAIK the smoothers aren't fit+evaluated until the plot is rendered. In general, if you want to manipulate fitted values, etc., you fit the model outside of ggplot and pass the data in to the specific layer you want it in.Oriana
@Oriana Actually you can, though its a bit hacky to get the fitted values out of the ggplot environment.Cann
C
69

stat_smooth does produce output that you can use elsewhere, and with a slightly hacky way, you can put it into a variable in the global environment.

You enclose the output variable in .. on either side to use it. So if you add an aes in the stat_smooth call and use the global assign, <<-, to assign the output to a varible in the global environment you can get the the fitted values, or others - see below.

qplot(hp,wt,data=mtcars) + stat_smooth(aes(outfit=fit<<-..y..))
fit
 [1] 1.993594 2.039986 2.087067 2.134889 2.183533 2.232867 2.282897 2.333626
 [9] 2.385059 2.437200 2.490053 2.543622 2.597911 2.652852 2.708104 2.764156
[17] 2.821771 2.888224 2.968745 3.049545 3.115893 3.156368 3.175495 3.181411
[25] 3.182252 3.186155 3.201258 3.235698 3.291766 3.353259 3.418409 3.487074
[33] 3.559111 3.634377 3.712729 3.813399 3.910849 3.977051 4.037302 4.091635
[41] 4.140082 4.182676 4.219447 4.250429 4.275654 4.295154 4.308961 4.317108
[49] 4.319626 4.316548 4.308435 4.302276 4.297902 4.292303 4.282505 4.269040
[57] 4.253361 4.235474 4.215385 4.193098 4.168621 4.141957 4.113114 4.082096
[65] 4.048910 4.013560 3.976052 3.936392 3.894586 3.850639 3.804557 3.756345
[73] 3.706009 3.653554 3.598987 3.542313 3.483536 3.422664 3.359701 3.294654

The outputs you can obtain are:

  • y, predicted value
  • ymin, lower pointwise confidence interval around the mean
  • ymax, upper pointwise confidence interval around the mean
  • se, standard error

Note that by default it predicts on 80 data points, which may not be aligned with your original data.

Cann answered 20/3, 2012 at 16:24 Comment(6)
You're not kidding when you call that hacky. (You missed the <<- in your code.)Oriana
@Oriana Thanks, fixed now. Though I'm getting a bit suspicious about the numbers that its producing.Cann
@Oriana Got it working now, it neeeds to be in the stat_smooth call, otherwise it just uses the y values of the original data.Cann
@James, Thats great! You mentioned by default it only predicts 80 data points, do you know how to change (increase) this?Tiertza
@Tiertza See ?stat_smooth and the argument n.Oriana
Thank you. Great anwsers guys I really appreciate it. On thing I will point out is that I have discovered is you need to actually call the plot directly to generate the fit. In my case I am looping through some regions and using the pdf and print to store to pdf file after using annotation_custom to add a small plot within the main plot. Seems the print(p) dosent generate the fit, so I addded a simple p (I think this is called "rendering" the plot) and it generated the fit variable in my workspace.Tiertza
U
77

Riffing off of @James example

p <- qplot(hp,wt,data=mtcars) + stat_smooth()

You can use the intermediate stages of the ggplot building process to pull out the plotted data. The results of ggplot_build is a list, one component of which is data which is a list of dataframes which contain the computed values to be plotted. In this case, the list is two dataframes since the original qplot creates one for points and the stat_smooth creates a smoothed one.

> ggplot_build(p)$data[[2]]
geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
           x        y     ymin     ymax        se PANEL group
1   52.00000 1.993594 1.149150 2.838038 0.4111133     1     1
2   55.58228 2.039986 1.303264 2.776709 0.3586695     1     1
3   59.16456 2.087067 1.443076 2.731058 0.3135236     1     1
4   62.74684 2.134889 1.567662 2.702115 0.2761514     1     1
5   66.32911 2.183533 1.677017 2.690049 0.2465948     1     1
6   69.91139 2.232867 1.771739 2.693995 0.2244980     1     1
7   73.49367 2.282897 1.853241 2.712552 0.2091756     1     1
8   77.07595 2.333626 1.923599 2.743652 0.1996193     1     1
9   80.65823 2.385059 1.985378 2.784740 0.1945828     1     1
10  84.24051 2.437200 2.041282 2.833117 0.1927505     1     1
11  87.82278 2.490053 2.093808 2.886297 0.1929096     1     1
12  91.40506 2.543622 2.145018 2.942225 0.1940582     1     1
13  94.98734 2.597911 2.196466 2.999355 0.1954412     1     1
14  98.56962 2.652852 2.249260 3.056444 0.1964867     1     1
15 102.15190 2.708104 2.303465 3.112744 0.1969967     1     1
16 105.73418 2.764156 2.357927 3.170385 0.1977705     1     1
17 109.31646 2.821771 2.414230 3.229311 0.1984091     1     1
18 112.89873 2.888224 2.478136 3.298312 0.1996493     1     1
19 116.48101 2.968745 2.531045 3.406444 0.2130917     1     1
20 120.06329 3.049545 2.552102 3.546987 0.2421773     1     1
21 123.64557 3.115893 2.573577 3.658208 0.2640235     1     1
22 127.22785 3.156368 2.601664 3.711072 0.2700548     1     1
23 130.81013 3.175495 2.625951 3.725039 0.2675429     1     1
24 134.39241 3.181411 2.645191 3.717631 0.2610560     1     1
25 137.97468 3.182252 2.658993 3.705511 0.2547460     1     1
26 141.55696 3.186155 2.670350 3.701961 0.2511175     1     1
27 145.13924 3.201258 2.687208 3.715308 0.2502626     1     1
28 148.72152 3.235698 2.721744 3.749652 0.2502159     1     1
29 152.30380 3.291766 2.782767 3.800765 0.2478037     1     1
30 155.88608 3.353259 2.857911 3.848607 0.2411575     1     1
31 159.46835 3.418409 2.938257 3.898561 0.2337596     1     1
32 163.05063 3.487074 3.017321 3.956828 0.2286972     1     1
33 166.63291 3.559111 3.092367 4.025855 0.2272319     1     1
34 170.21519 3.634377 3.165426 4.103328 0.2283065     1     1
35 173.79747 3.712729 3.242093 4.183364 0.2291263     1     1
36 177.37975 3.813399 3.347232 4.279565 0.2269509     1     1
37 180.96203 3.910849 3.447572 4.374127 0.2255441     1     1
38 184.54430 3.977051 3.517784 4.436318 0.2235917     1     1
39 188.12658 4.037302 3.583959 4.490645 0.2207076     1     1
40 191.70886 4.091635 3.645111 4.538160 0.2173882     1     1
41 195.29114 4.140082 3.700184 4.579981 0.2141624     1     1
42 198.87342 4.182676 3.748159 4.617192 0.2115424     1     1
43 202.45570 4.219447 3.788162 4.650732 0.2099688     1     1
44 206.03797 4.250429 3.819579 4.681280 0.2097573     1     1
45 209.62025 4.275654 3.842137 4.709171 0.2110556     1     1
46 213.20253 4.295154 3.855951 4.734357 0.2138238     1     1
47 216.78481 4.308961 3.861497 4.756425 0.2178456     1     1
48 220.36709 4.317108 3.859541 4.774675 0.2227644     1     1
49 223.94937 4.319626 3.851025 4.788227 0.2281358     1     1
50 227.53165 4.316548 3.836964 4.796132 0.2334829     1     1
51 231.11392 4.308435 3.818728 4.798143 0.2384117     1     1
52 234.69620 4.302276 3.802201 4.802351 0.2434590     1     1
53 238.27848 4.297902 3.787395 4.808409 0.2485379     1     1
54 241.86076 4.292303 3.772103 4.812503 0.2532567     1     1
55 245.44304 4.282505 3.754087 4.810923 0.2572576     1     1
56 249.02532 4.269040 3.733184 4.804896 0.2608786     1     1
57 252.60759 4.253361 3.710042 4.796680 0.2645121     1     1
58 256.18987 4.235474 3.684476 4.786473 0.2682509     1     1
59 259.77215 4.215385 3.656265 4.774504 0.2722044     1     1
60 263.35443 4.193098 3.625161 4.761036 0.2764974     1     1
61 266.93671 4.168621 3.590884 4.746357 0.2812681     1     1
62 270.51899 4.141957 3.553134 4.730781 0.2866658     1     1
63 274.10127 4.113114 3.511593 4.714635 0.2928472     1     1
64 277.68354 4.082096 3.465939 4.698253 0.2999729     1     1
65 281.26582 4.048910 3.415849 4.681971 0.3082025     1     1
66 284.84810 4.013560 3.361010 4.666109 0.3176905     1     1
67 288.43038 3.976052 3.301132 4.650972 0.3285813     1     1
68 292.01266 3.936392 3.235952 4.636833 0.3410058     1     1
69 295.59494 3.894586 3.165240 4.623932 0.3550782     1     1
70 299.17722 3.850639 3.088806 4.612473 0.3708948     1     1
71 302.75949 3.804557 3.006494 4.602619 0.3885326     1     1
72 306.34177 3.756345 2.918191 4.594499 0.4080510     1     1
73 309.92405 3.706009 2.823813 4.588205 0.4294926     1     1
74 313.50633 3.653554 2.723308 4.583801 0.4528856     1     1
75 317.08861 3.598987 2.616650 4.581325 0.4782460     1     1
76 320.67089 3.542313 2.503829 4.580796 0.5055805     1     1
77 324.25316 3.483536 2.384853 4.582220 0.5348886     1     1
78 327.83544 3.422664 2.259739 4.585589 0.5661643     1     1
79 331.41772 3.359701 2.128512 4.590891 0.5993985     1     1
80 335.00000 3.294654 1.991200 4.598107 0.6345798     1     1

Knowing a priori where the one you want is in the list isn't easy, but if nothing else you can look at the column names.

It is still better to do the smoothing outside the ggplot call, though.

EDIT:

It turns out replicating what ggplot2 does to make the loess is not as straightforward as I thought, but this will work. I copied it out of some internal functions in ggplot2.

model <- loess(wt ~ hp, data=mtcars)
xrange <- range(mtcars$hp)
xseq <- seq(from=xrange[1], to=xrange[2], length=80)
pred <- predict(model, newdata = data.frame(hp = xseq), se=TRUE)
y = pred$fit
ci <- pred$se.fit * qt(0.95 / 2 + .5, pred$df)
ymin = y - ci
ymax = y + ci
loess.DF <- data.frame(x = xseq, y, ymin, ymax, se = pred$se.fit)

ggplot(mtcars, aes(x=hp, y=wt)) +
  geom_point() +
  geom_smooth(aes_auto(loess.DF), data=loess.DF, stat="identity")

That gives a plot that looks identical to

ggplot(mtcars, aes(x=hp, y=wt)) +
  geom_point() +
  geom_smooth()

(which is the expanded form of the original p).

Ulani answered 20/3, 2012 at 16:57 Comment(9)
Nice, it's a lot more useful to get the x values too.Cann
Thanks. Rookie curious, why do you say it is better to do smoothing outside the ggplot call? Seems like if you need to produce the plots anyway, it would be repetitive to do a sepreate fit just to extract the fit values.Tiertza
I wouldn't do the fit outside the ggplot call in addition to one inside of it, but rather instead of. I would form the fit data into an appropriate format/dataframe and then plot that. There is an example of that buried in this answer: https://mcmap.net/q/296856/-displaying-dates-on-xaxis-in-readable-format-rUlani
@Tiertza You need far more than the "fit" to evaluate the fitted model and therefore judge whether the "fit" is adequate or not.Moultrie
@BrianDiggs...do you have a example simular to what you referenced above (Mar 20 at 19:19) of how to recreate a loess fit plotthat was created outside of ggplot?Tiertza
@Tiertza I added an example of performing a loess beforehand to the answer.Ulani
thanks..very helpful,I really appreciate it. Last question and I won't bother you for a very long time. If my origional loess tranformed the y variable [i.e. model <- loess(log10(wt) ~ hp, data=mtcars) ] what is the best way to produced the produce the plot in non log10 space? I can get most of it by producing a secondary loess.DF2 data frame with the necessary transformations back to linear space but for some reason I can't get the 95% ci to show up. Regardless, I'm guessing you have a far more efficient way to do it than crating a redundant loess.DF2 data frame.Tiertza
Changing the model line to model <- loess(log10(wt) ~ hp, data=mtcars) and the loess.DF line to loess.DF <- data.frame(x = xseq, y=10^y, ymin=10^ymin, ymax=10^ymax, se = pred$se.fit), I get a plot with a curve with a band around it. Basically, I computed y, ymin, and ymax in log space and then transformed those back.Ulani
This technique rocks. By increasing the sampling (n-365) I can get smoothing data for everyday of a year long timeseries which makes my life super easy for some analysis. Awesome, thanks.Michaeline
C
69

stat_smooth does produce output that you can use elsewhere, and with a slightly hacky way, you can put it into a variable in the global environment.

You enclose the output variable in .. on either side to use it. So if you add an aes in the stat_smooth call and use the global assign, <<-, to assign the output to a varible in the global environment you can get the the fitted values, or others - see below.

qplot(hp,wt,data=mtcars) + stat_smooth(aes(outfit=fit<<-..y..))
fit
 [1] 1.993594 2.039986 2.087067 2.134889 2.183533 2.232867 2.282897 2.333626
 [9] 2.385059 2.437200 2.490053 2.543622 2.597911 2.652852 2.708104 2.764156
[17] 2.821771 2.888224 2.968745 3.049545 3.115893 3.156368 3.175495 3.181411
[25] 3.182252 3.186155 3.201258 3.235698 3.291766 3.353259 3.418409 3.487074
[33] 3.559111 3.634377 3.712729 3.813399 3.910849 3.977051 4.037302 4.091635
[41] 4.140082 4.182676 4.219447 4.250429 4.275654 4.295154 4.308961 4.317108
[49] 4.319626 4.316548 4.308435 4.302276 4.297902 4.292303 4.282505 4.269040
[57] 4.253361 4.235474 4.215385 4.193098 4.168621 4.141957 4.113114 4.082096
[65] 4.048910 4.013560 3.976052 3.936392 3.894586 3.850639 3.804557 3.756345
[73] 3.706009 3.653554 3.598987 3.542313 3.483536 3.422664 3.359701 3.294654

The outputs you can obtain are:

  • y, predicted value
  • ymin, lower pointwise confidence interval around the mean
  • ymax, upper pointwise confidence interval around the mean
  • se, standard error

Note that by default it predicts on 80 data points, which may not be aligned with your original data.

Cann answered 20/3, 2012 at 16:24 Comment(6)
You're not kidding when you call that hacky. (You missed the <<- in your code.)Oriana
@Oriana Thanks, fixed now. Though I'm getting a bit suspicious about the numbers that its producing.Cann
@Oriana Got it working now, it neeeds to be in the stat_smooth call, otherwise it just uses the y values of the original data.Cann
@James, Thats great! You mentioned by default it only predicts 80 data points, do you know how to change (increase) this?Tiertza
@Tiertza See ?stat_smooth and the argument n.Oriana
Thank you. Great anwsers guys I really appreciate it. On thing I will point out is that I have discovered is you need to actually call the plot directly to generate the fit. In my case I am looping through some regions and using the pdf and print to store to pdf file after using annotation_custom to add a small plot within the main plot. Seems the print(p) dosent generate the fit, so I addded a simple p (I think this is called "rendering" the plot) and it generated the fit variable in my workspace.Tiertza
W
20

A more general approach could be to simply use the predict() function to predict any range of values that are interesting.

# define the model
model <- loess(wt ~ hp, data = mtcars)

# predict fitted values for each observation in the original dataset
modelFit <- data.frame(predict(model, se = TRUE))

# define data frame for ggplot
df <- data.frame(cbind(hp = mtcars$hp
          , wt = mtcars$wt
          , fit = modelFit$fit
          , upperBound = modelFit$fit + 2 * modelFit$se.fit
          , lowerBound = modelFit$fit - 2 * modelFit$se.fit
          ))

# build the plot using the fitted values from the predict() function
# geom_linerange() and the second geom_point() in the code are built using the values from the predict() function
# for comparison ggplot's geom_smooth() is also shown
g <- ggplot(df, aes(hp, wt))
g <- g + geom_point()
g <- g + geom_linerange(aes(ymin = lowerBound, ymax = upperBound))
g <- g + geom_point(aes(hp, fit, size = 1))
g <- g + geom_smooth(method = "loess")
g

# Predict any range of values and include the standard error in the output
predict(model, newdata = 100:300, se = TRUE)
Warbeck answered 2/3, 2016 at 9:9 Comment(0)
E
10

If you want to bring in the power of the tidyverse, you can use the "broom" library to add the predicted values from the loess function to your original dataset. This is building on @phillyooo's solution.

library(tidyverse)
library(broom)

# original graph with smoother
ggplot(data=mtcars, aes(hp,wt)) + 
  stat_smooth(method = "loess", span = 0.75)

# Create model that will do the same thing as under the hood in ggplot2
model <- loess(wt ~ hp, data = mtcars, span = 0.75)

# Add predicted values from model to original dataset using broom library
mtcars2 <- augment(model, mtcars)

# Plot both lines 
ggplot(data=mtcars2, aes(hp,wt)) + 
  geom_line(aes(hp, .fitted), color = "red") +
  stat_smooth(method = "loess", span = 0.75)
Essentiality answered 13/11, 2018 at 15:37 Comment(0)
Y
8

Save the graph object and use ggplot_build() or layer_data() to obtain the elements/estimates for the layers. e.g.

pp<-ggplot(mtcars, aes(x=hp, y=wt)) +   geom_point() +   geom_smooth();
ggplot_build(pp)
Yeast answered 14/1, 2019 at 2:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.