ggplot: Extend regression line to predicted value with different linetype
Asked Answered
A

2

5

Is there a simple way to extend a dotted line from the end of a solid regression line to a predicted value?

Below is my basic attempt at it:

x = rnorm(10)
y = 5 + x + rnorm(10,0,0.4)

my_lm <- lm(y~x)
summary(my_lm)

my_intercept <- my_lm$coef[1]
my_slope <- my_lm$coef[2]
my_pred = predict(my_lm,data.frame(x = (max(x)+1)))

ggdf <- data.frame( x = c(x,max(x)+1), y = c(y,my_pred), obs_Or_Pred = c(rep("Obs",10),"Pred") )

ggplot(ggdf, aes(x = x, y = y, group = obs_Or_Pred ) ) +
     geom_point( size = 3, aes(colour = obs_Or_Pred) ) + 
     geom_abline( intercept = my_intercept, slope = my_slope, aes( linetype = obs_Or_Pred ) )

This doesn't give the output I'd hoped to see. I've looked at some other answers on SO and haven't seen anything simple.The best I've come up with is:

ggdf2 <- data.frame( x = c(x,max(x),max(x)+12), y = c(y,my_intercept+max(x)*my_slope,my_pred), obs_Or_Pred = c(rep("Obs",8),"Pred","Pred"), show_Data_Point = c(rep(TRUE,8),FALSE,TRUE) )

ggplot(ggdf2, aes(x = x, y = y, group = obs_Or_Pred ) ) +
     geom_point( data = ggdf2[ggdf2[,"show_Data_Point"],] ,size = 3, aes(colour = obs_Or_Pred) ) + 
     geom_smooth( method = "lm", se=F, aes(colour = obs_Or_Pred, linetype=obs_Or_Pred) )
 

This gives output which is correct, but I have had to include an extra column specifying whether or not I want to show the data points. If I don't, I end up with the second of these two plots, which has an extra point at the end of the fitted regression line:

enter image description here

Is there a simpler way to tell ggplot to predict a single point out from the linear model and draw a dashed line to it?

Alnico answered 20/12, 2017 at 17:4 Comment(5)
Your method seems straightforward to me.Dysgenic
It feels awkward to have to do the predictions in advance and then have to specify which rows will be shown as points and which as lines. It's fine when there are a small number of predictions/times to do it, but if I wanted to do it repeatedly it would be tedious to do manually.Alnico
I mean, ggplot is a plotting package, not a modeling package. It's perfect at plotting the data you give it. geom_smooth is a nice convenience for simple use cases, but when you want non-standard models/predictions, you shouldn't be surprised that you need to explicitly give it the data you want to plot.Dysgenic
And no, you shouldn't do it manually, you should write a little helper function that does all the data prep for you.Dysgenic
Fair points, Gregor!Alnico
G
6

You can plot the points using only your actual data and build a prediction data frame to add the lines. Note that max(x) appears twice so that it can be an endpoint of both the Obs line and the Pred line. We also use a shape aesthetic so that we can remove the point marker that would otherwise appear in the legend key for Pred.

# Build prediction data frame
pred_x = c(min(x),rep(max(x),2),max(x)+1)
pred_lines = data.frame(x=pred_x,
                        y=predict(my_lm, data.frame(x=pred_x)),
                        obs_Or_Pred=rep(c("Obs","Pred"), each=2))

ggplot(pred_lines, aes(x, y, colour=obs_Or_Pred, shape=obs_Or_Pred, linetype=obs_Or_Pred)) +
  geom_point(data=data.frame(x,y, obs_Or_Pred="Obs"), size=3) +
  geom_line(size=1) +
  scale_shape_manual(values=c(16,NA)) +
  theme_bw()

enter image description here

Genevagenevan answered 20/12, 2017 at 17:31 Comment(0)
P
3

Semi-ugly: You can use scale_x_continuous(limits = to set the range of x values used for prediction. Plot the predicted line first with fullrange = TRUE, then add the 'observed' line on top. Note that the overplotting isn't rendered perfectly, and you may want to increase the size of the observed line slightly.

ggplot(d, aes(x, y)) +
  geom_point(aes(color = "obs")) +
  geom_smooth(aes(color = "pred", linetype = "pred"), se = FALSE, method = "lm",
                                                      fullrange = TRUE) +
  geom_smooth(aes(color = "obs", linetype = "obs"), size = 1.05, se = FALSE, method = "lm") +
  scale_linetype_discrete(name = "obs_or_pred") +
  scale_color_discrete(name = "obs_or_pred") +
  scale_x_continuous(limits = c(NA, max(x) + 1))

enter image description here


However, I tend to agree with Gregor: "ggplot is a plotting package, not a modeling package".

Pegues answered 20/12, 2017 at 18:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.