I'm trying to use stat_ecdf()
to plot cumulative successes as a function of a rank score created by a predictive model.
#libraries
require(ggplot2)
require(scales)
# fake data for reproducibility
set.seed(123)
n <- 200
df <- data.frame(model_score= rexp(n=n,rate=1:n),
obs_set= sample(c("training","validation"),n,replace=TRUE))
df$model_rank <- rank(df$model_score)/n
df$target_outcome <- rbinom(n,1,1-df$model_rank)
# Plot Gain Chart using stat_ecdf()
ggplot(subset(df,target_outcome==1),aes(x = model_rank)) +
stat_ecdf(aes(colour = obs_set), size=1) +
scale_x_continuous(limits=c(0,1), labels=percent,breaks=seq(0,1,.1)) +
xlab("Model Percentile") + ylab("Percent of Target Outcome") +
scale_y_continuous(limits=c(0,1), labels=percent) +
geom_segment(aes(x=0,y=0,xend=1,yend=1),
colour = "gray", linetype="longdash", size=1) +
ggtitle("Gain Chart")
All I want to do is force the ECDF to start at (0,0) and end at (1,1) so that there are no gaps at the beginning or end of the curve. If possible, I'd like to do it within the syntax of ggplot2
, but I'd settle for a clever workaround.
@Henrik this is NOT a duplicate of this question, because I have already defined my limits with scale_x_
and _y_continuous()
, and adding expand_limits()
doesn't do anything. It is not the origin of the PLOT but the endpoints of the stat_ecdf() that need fixed.
+ scale_y_continuous(labels=percent)
and don't forgetlibrary(scales)
– Toxinantitoxin