Method to extract stat_smooth line fit

Asked 20/3, 2012 at 15:35 Answered 14/1, 2019 at 2:52

Is there a way to extract the values of the fitted line returned from stat_smooth?

The code I am using looks like this:

p <- ggplot(df1, aes(x=Days, y= Qty,group=Category,color=Category))
p <- p + stat_smooth(method=glm, fullrange=TRUE)+ geom_point())

This new r user would greatly appreciate any guidance.

Tiertza answered 20/3, 2012 at 15:35 Comment(4)

You example code isn't reproducible, can you make it so? – Cann 20/3, 2012 at 15:46

Yes I know, and I will make it so as soon as my R stops saying "Not Responding". I just thought someone might know of the top of their head w/out reproducible code. – Tiertza 20/3, 2012 at 15:51

I believe the answer is no, as AFAIK the smoothers aren't fit+evaluated until the plot is rendered. In general, if you want to manipulate fitted values, etc., you fit the model outside of ggplot and pass the data in to the specific layer you want it in. – Oriana 20/3, 2012 at 15:53

@Oriana Actually you can, though its a bit hacky to get the fitted values out of the ggplot environment. – Cann 20/3, 2012 at 16:25

stat_smooth does produce output that you can use elsewhere, and with a slightly hacky way, you can put it into a variable in the global environment.

You enclose the output variable in .. on either side to use it. So if you add an aes in the stat_smooth call and use the global assign, <<-, to assign the output to a varible in the global environment you can get the the fitted values, or others - see below.

qplot(hp,wt,data=mtcars) + stat_smooth(aes(outfit=fit<<-..y..))
fit
 [1] 1.993594 2.039986 2.087067 2.134889 2.183533 2.232867 2.282897 2.333626
 [9] 2.385059 2.437200 2.490053 2.543622 2.597911 2.652852 2.708104 2.764156
[17] 2.821771 2.888224 2.968745 3.049545 3.115893 3.156368 3.175495 3.181411
[25] 3.182252 3.186155 3.201258 3.235698 3.291766 3.353259 3.418409 3.487074
[33] 3.559111 3.634377 3.712729 3.813399 3.910849 3.977051 4.037302 4.091635
[41] 4.140082 4.182676 4.219447 4.250429 4.275654 4.295154 4.308961 4.317108
[49] 4.319626 4.316548 4.308435 4.302276 4.297902 4.292303 4.282505 4.269040
[57] 4.253361 4.235474 4.215385 4.193098 4.168621 4.141957 4.113114 4.082096
[65] 4.048910 4.013560 3.976052 3.936392 3.894586 3.850639 3.804557 3.756345
[73] 3.706009 3.653554 3.598987 3.542313 3.483536 3.422664 3.359701 3.294654

The outputs you can obtain are:

y, predicted value
ymin, lower pointwise confidence interval around the mean
ymax, upper pointwise confidence interval around the mean
se, standard error

Note that by default it predicts on 80 data points, which may not be aligned with your original data.

Cann answered 20/3, 2012 at 16:24 Comment(6)

You're not kidding when you call that hacky. (You missed the <<- in your code.) – Oriana 20/3, 2012 at 16:31

@Oriana Thanks, fixed now. Though I'm getting a bit suspicious about the numbers that its producing. – Cann 20/3, 2012 at 16:36

@Oriana Got it working now, it neeeds to be in the stat_smooth call, otherwise it just uses the y values of the original data. – Cann 20/3, 2012 at 16:46

@James, Thats great! You mentioned by default it only predicts 80 data points, do you know how to change (increase) this? – Tiertza 20/3, 2012 at 17:0

@Tiertza See ?stat_smooth and the argument n. – Oriana 20/3, 2012 at 17:15

Thank you. Great anwsers guys I really appreciate it. On thing I will point out is that I have discovered is you need to actually call the plot directly to generate the fit. In my case I am looping through some regions and using the pdf and print to store to pdf file after using annotation_custom to add a small plot within the main plot. Seems the print(p) dosent generate the fit, so I addded a simple p (I think this is called "rendering" the plot) and it generated the fit variable in my workspace. – Tiertza 20/3, 2012 at 19:0

Riffing off of @James example

p <- qplot(hp,wt,data=mtcars) + stat_smooth()

You can use the intermediate stages of the ggplot building process to pull out the plotted data. The results of ggplot_build is a list, one component of which is data which is a list of dataframes which contain the computed values to be plotted. In this case, the list is two dataframes since the original qplot creates one for points and the stat_smooth creates a smoothed one.

> ggplot_build(p)$data[[2]]
geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
           x        y     ymin     ymax        se PANEL group
1   52.00000 1.993594 1.149150 2.838038 0.4111133     1     1
2   55.58228 2.039986 1.303264 2.776709 0.3586695     1     1
3   59.16456 2.087067 1.443076 2.731058 0.3135236     1     1
4   62.74684 2.134889 1.567662 2.702115 0.2761514     1     1
5   66.32911 2.183533 1.677017 2.690049 0.2465948     1     1
6   69.91139 2.232867 1.771739 2.693995 0.2244980     1     1
7   73.49367 2.282897 1.853241 2.712552 0.2091756     1     1
8   77.07595 2.333626 1.923599 2.743652 0.1996193     1     1
9   80.65823 2.385059 1.985378 2.784740 0.1945828     1     1
10  84.24051 2.437200 2.041282 2.833117 0.1927505     1     1
11  87.82278 2.490053 2.093808 2.886297 0.1929096     1     1
12  91.40506 2.543622 2.145018 2.942225 0.1940582     1     1
13  94.98734 2.597911 2.196466 2.999355 0.1954412     1     1
14  98.56962 2.652852 2.249260 3.056444 0.1964867     1     1
15 102.15190 2.708104 2.303465 3.112744 0.1969967     1     1
16 105.73418 2.764156 2.357927 3.170385 0.1977705     1     1
17 109.31646 2.821771 2.414230 3.229311 0.1984091     1     1
18 112.89873 2.888224 2.478136 3.298312 0.1996493     1     1
19 116.48101 2.968745 2.531045 3.406444 0.2130917     1     1
20 120.06329 3.049545 2.552102 3.546987 0.2421773     1     1
21 123.64557 3.115893 2.573577 3.658208 0.2640235     1     1
22 127.22785 3.156368 2.601664 3.711072 0.2700548     1     1
23 130.81013 3.175495 2.625951 3.725039 0.2675429     1     1
24 134.39241 3.181411 2.645191 3.717631 0.2610560     1     1
25 137.97468 3.182252 2.658993 3.705511 0.2547460     1     1
26 141.55696 3.186155 2.670350 3.701961 0.2511175     1     1
27 145.13924 3.201258 2.687208 3.715308 0.2502626     1     1
28 148.72152 3.235698 2.721744 3.749652 0.2502159     1     1
29 152.30380 3.291766 2.782767 3.800765 0.2478037     1     1
30 155.88608 3.353259 2.857911 3.848607 0.2411575     1     1
31 159.46835 3.418409 2.938257 3.898561 0.2337596     1     1
32 163.05063 3.487074 3.017321 3.956828 0.2286972     1     1
33 166.63291 3.559111 3.092367 4.025855 0.2272319     1     1
34 170.21519 3.634377 3.165426 4.103328 0.2283065     1     1
35 173.79747 3.712729 3.242093 4.183364 0.2291263     1     1
36 177.37975 3.813399 3.347232 4.279565 0.2269509     1     1
37 180.96203 3.910849 3.447572 4.374127 0.2255441     1     1
38 184.54430 3.977051 3.517784 4.436318 0.2235917     1     1
39 188.12658 4.037302 3.583959 4.490645 0.2207076     1     1
40 191.70886 4.091635 3.645111 4.538160 0.2173882     1     1
41 195.29114 4.140082 3.700184 4.579981 0.2141624     1     1
42 198.87342 4.182676 3.748159 4.617192 0.2115424     1     1
43 202.45570 4.219447 3.788162 4.650732 0.2099688     1     1
44 206.03797 4.250429 3.819579 4.681280 0.2097573     1     1
45 209.62025 4.275654 3.842137 4.709171 0.2110556     1     1
46 213.20253 4.295154 3.855951 4.734357 0.2138238     1     1
47 216.78481 4.308961 3.861497 4.756425 0.2178456     1     1
48 220.36709 4.317108 3.859541 4.774675 0.2227644     1     1
49 223.94937 4.319626 3.851025 4.788227 0.2281358     1     1
50 227.53165 4.316548 3.836964 4.796132 0.2334829     1     1
51 231.11392 4.308435 3.818728 4.798143 0.2384117     1     1
52 234.69620 4.302276 3.802201 4.802351 0.2434590     1     1
53 238.27848 4.297902 3.787395 4.808409 0.2485379     1     1
54 241.86076 4.292303 3.772103 4.812503 0.2532567     1     1
55 245.44304 4.282505 3.754087 4.810923 0.2572576     1     1
56 249.02532 4.269040 3.733184 4.804896 0.2608786     1     1
57 252.60759 4.253361 3.710042 4.796680 0.2645121     1     1
58 256.18987 4.235474 3.684476 4.786473 0.2682509     1     1
59 259.77215 4.215385 3.656265 4.774504 0.2722044     1     1
60 263.35443 4.193098 3.625161 4.761036 0.2764974     1     1
61 266.93671 4.168621 3.590884 4.746357 0.2812681     1     1
62 270.51899 4.141957 3.553134 4.730781 0.2866658     1     1
63 274.10127 4.113114 3.511593 4.714635 0.2928472     1     1
64 277.68354 4.082096 3.465939 4.698253 0.2999729     1     1
65 281.26582 4.048910 3.415849 4.681971 0.3082025     1     1
66 284.84810 4.013560 3.361010 4.666109 0.3176905     1     1
67 288.43038 3.976052 3.301132 4.650972 0.3285813     1     1
68 292.01266 3.936392 3.235952 4.636833 0.3410058     1     1
69 295.59494 3.894586 3.165240 4.623932 0.3550782     1     1
70 299.17722 3.850639 3.088806 4.612473 0.3708948     1     1
71 302.75949 3.804557 3.006494 4.602619 0.3885326     1     1
72 306.34177 3.756345 2.918191 4.594499 0.4080510     1     1
73 309.92405 3.706009 2.823813 4.588205 0.4294926     1     1
74 313.50633 3.653554 2.723308 4.583801 0.4528856     1     1
75 317.08861 3.598987 2.616650 4.581325 0.4782460     1     1
76 320.67089 3.542313 2.503829 4.580796 0.5055805     1     1
77 324.25316 3.483536 2.384853 4.582220 0.5348886     1     1
78 327.83544 3.422664 2.259739 4.585589 0.5661643     1     1
79 331.41772 3.359701 2.128512 4.590891 0.5993985     1     1
80 335.00000 3.294654 1.991200 4.598107 0.6345798     1     1

Knowing a priori where the one you want is in the list isn't easy, but if nothing else you can look at the column names.

It is still better to do the smoothing outside the ggplot call, though.

EDIT:

It turns out replicating what ggplot2 does to make the loess is not as straightforward as I thought, but this will work. I copied it out of some internal functions in ggplot2.

model <- loess(wt ~ hp, data=mtcars)
xrange <- range(mtcars$hp)
xseq <- seq(from=xrange[1], to=xrange[2], length=80)
pred <- predict(model, newdata = data.frame(hp = xseq), se=TRUE)
y = pred$fit
ci <- pred$se.fit * qt(0.95 / 2 + .5, pred$df)
ymin = y - ci
ymax = y + ci
loess.DF <- data.frame(x = xseq, y, ymin, ymax, se = pred$se.fit)

ggplot(mtcars, aes(x=hp, y=wt)) +
  geom_point() +
  geom_smooth(aes_auto(loess.DF), data=loess.DF, stat="identity")

That gives a plot that looks identical to

ggplot(mtcars, aes(x=hp, y=wt)) +
  geom_point() +
  geom_smooth()

(which is the expanded form of the original p).

Ulani answered 20/3, 2012 at 16:57 Comment(9)

Nice, it's a lot more useful to get the x values too. – Cann 20/3, 2012 at 17:30

Thanks. Rookie curious, why do you say it is better to do smoothing outside the ggplot call? Seems like if you need to produce the plots anyway, it would be repetitive to do a sepreate fit just to extract the fit values. – Tiertza 20/3, 2012 at 19:3

I wouldn't do the fit outside the ggplot call in addition to one inside of it, but rather instead of. I would form the fit data into an appropriate format/dataframe and then plot that. There is an example of that buried in this answer: https://mcmap.net/q/296856/-displaying-dates-on-xaxis-in-readable-format-r – Ulani 20/3, 2012 at 19:19

@Tiertza You need far more than the "fit" to evaluate the fitted model and therefore judge whether the "fit" is adequate or not. – Moultrie 21/3, 2012 at 9:57

@BrianDiggs...do you have a example simular to what you referenced above (Mar 20 at 19:19) of how to recreate a loess fit plotthat was created outside of ggplot? – Tiertza 23/3, 2012 at 19:28

@Tiertza I added an example of performing a loess beforehand to the answer. – Ulani 26/3, 2012 at 17:20

thanks..very helpful,I really appreciate it. Last question and I won't bother you for a very long time. If my origional loess tranformed the y variable [i.e. model <- loess(log10(wt) ~ hp, data=mtcars) ] what is the best way to produced the produce the plot in non log10 space? I can get most of it by producing a secondary loess.DF2 data frame with the necessary transformations back to linear space but for some reason I can't get the 95% ci to show up. Regardless, I'm guessing you have a far more efficient way to do it than crating a redundant loess.DF2 data frame. – Tiertza 26/3, 2012 at 18:54

Changing the model line to model <- loess(log10(wt) ~ hp, data=mtcars) and the loess.DF line to loess.DF <- data.frame(x = xseq, y=10^y, ymin=10^ymin, ymax=10^ymax, se = pred$se.fit), I get a plot with a curve with a band around it. Basically, I computed y, ymin, and ymax in log space and then transformed those back. – Ulani 26/3, 2012 at 19:24

This technique rocks. By increasing the sampling (n-365) I can get smoothing data for everyday of a year long timeseries which makes my life super easy for some analysis. Awesome, thanks. – Michaeline 10/6, 2015 at 9:37

stat_smooth does produce output that you can use elsewhere, and with a slightly hacky way, you can put it into a variable in the global environment.

qplot(hp,wt,data=mtcars) + stat_smooth(aes(outfit=fit<<-..y..))
fit
 [1] 1.993594 2.039986 2.087067 2.134889 2.183533 2.232867 2.282897 2.333626
 [9] 2.385059 2.437200 2.490053 2.543622 2.597911 2.652852 2.708104 2.764156
[17] 2.821771 2.888224 2.968745 3.049545 3.115893 3.156368 3.175495 3.181411
[25] 3.182252 3.186155 3.201258 3.235698 3.291766 3.353259 3.418409 3.487074
[33] 3.559111 3.634377 3.712729 3.813399 3.910849 3.977051 4.037302 4.091635
[41] 4.140082 4.182676 4.219447 4.250429 4.275654 4.295154 4.308961 4.317108
[49] 4.319626 4.316548 4.308435 4.302276 4.297902 4.292303 4.282505 4.269040
[57] 4.253361 4.235474 4.215385 4.193098 4.168621 4.141957 4.113114 4.082096
[65] 4.048910 4.013560 3.976052 3.936392 3.894586 3.850639 3.804557 3.756345
[73] 3.706009 3.653554 3.598987 3.542313 3.483536 3.422664 3.359701 3.294654

The outputs you can obtain are:

y, predicted value
ymin, lower pointwise confidence interval around the mean
ymax, upper pointwise confidence interval around the mean
se, standard error

Note that by default it predicts on 80 data points, which may not be aligned with your original data.

Cann answered 20/3, 2012 at 16:24 Comment(6)

You're not kidding when you call that hacky. (You missed the <<- in your code.) – Oriana 20/3, 2012 at 16:31

@Oriana Thanks, fixed now. Though I'm getting a bit suspicious about the numbers that its producing. – Cann 20/3, 2012 at 16:36

@Oriana Got it working now, it neeeds to be in the stat_smooth call, otherwise it just uses the y values of the original data. – Cann 20/3, 2012 at 16:46

@James, Thats great! You mentioned by default it only predicts 80 data points, do you know how to change (increase) this? – Tiertza 20/3, 2012 at 17:0

@Tiertza See ?stat_smooth and the argument n. – Oriana 20/3, 2012 at 17:15

A more general approach could be to simply use the predict() function to predict any range of values that are interesting.

# define the model
model <- loess(wt ~ hp, data = mtcars)

# predict fitted values for each observation in the original dataset
modelFit <- data.frame(predict(model, se = TRUE))

# define data frame for ggplot
df <- data.frame(cbind(hp = mtcars$hp
          , wt = mtcars$wt
          , fit = modelFit$fit
          , upperBound = modelFit$fit + 2 * modelFit$se.fit
          , lowerBound = modelFit$fit - 2 * modelFit$se.fit
          ))

# build the plot using the fitted values from the predict() function
# geom_linerange() and the second geom_point() in the code are built using the values from the predict() function
# for comparison ggplot's geom_smooth() is also shown
g <- ggplot(df, aes(hp, wt))
g <- g + geom_point()
g <- g + geom_linerange(aes(ymin = lowerBound, ymax = upperBound))
g <- g + geom_point(aes(hp, fit, size = 1))
g <- g + geom_smooth(method = "loess")
g

# Predict any range of values and include the standard error in the output
predict(model, newdata = 100:300, se = TRUE)

Warbeck answered 2/3, 2016 at 9:9 Comment(0)

If you want to bring in the power of the tidyverse, you can use the "broom" library to add the predicted values from the loess function to your original dataset. This is building on @phillyooo's solution.

library(tidyverse)
library(broom)

# original graph with smoother
ggplot(data=mtcars, aes(hp,wt)) + 
  stat_smooth(method = "loess", span = 0.75)

# Create model that will do the same thing as under the hood in ggplot2
model <- loess(wt ~ hp, data = mtcars, span = 0.75)

# Add predicted values from model to original dataset using broom library
mtcars2 <- augment(model, mtcars)

# Plot both lines 
ggplot(data=mtcars2, aes(hp,wt)) + 
  geom_line(aes(hp, .fitted), color = "red") +
  stat_smooth(method = "loess", span = 0.75)

Essentiality answered 13/11, 2018 at 15:37 Comment(0)

Save the graph object and use ggplot_build() or layer_data() to obtain the elements/estimates for the layers. e.g.

pp<-ggplot(mtcars, aes(x=hp, y=wt)) +   geom_point() +   geom_smooth();
ggplot_build(pp)

Yeast answered 14/1, 2019 at 2:52 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags