how to specify color of lines and points in ecdf ggplot2
Asked Answered
T

2

6

I have a set of data that is tough to visualize, but I think an ECDF with a couple of points and lines added to it will do the trick. I am able to plot things the way that I want; my problem is coloring things correctly.

I have the following code, which puts all of the right lines and points on the plot, but now I would like to properly color and label everything. I've pored over multiple articles and tried a hundred things, but can't get it right. Do i need to format my data differently?

My vision for the legend is something like this:

  • dashed line = b
  • solid line = a
  • red = s
  • blue = d
  • dot = s.mean

code for generating an example plot is here:

require(ggplot2)
require(reshape2)

s.a = rnorm(100)*100
s.b = rnorm(100)*100+50
d.a = -35
d.b = 20
sdata = data.frame(cbind(s.a,s.b))
ddata = data.frame(cbind(d.a,d.b))
sdata.m = melt(sdata)
ddata.m = melt(ddata)

ggplot(sdata.m, aes(x=value, color=variable)) +
  geom_vline(data=ddata.m,
             aes(xintercept = value,
                 color=variable),
             linetype = 2,
             size=2) + 
  stat_ecdf(size=1)+
  labs(title = 'plotTitle',
       color='colorLegendTitle') +
  xlab('xLabel') +
  ylab('yLabel')+
  theme_bw(30) +
  theme(
    legend.position=c(.8, .2),
    legend.box="horizontal",
    text=element_text(family="Times"),
    legend.key.size = unit(1,"cm")) +
  geom_point(x=mean(sdata.m$value[sdata.m$variable=="s.a"]),y=.5,
             size = 5) +
  geom_point(x=mean(sdata.m$value[sdata.m$variable=="s.b"]),y=.5,
             size = 5)

enter image description here Some context on the data I'm plotting: I have stochastic datasets (s) and deterministic sets (d); each stochastic set will have hundreds of values, while the deterministic sets only have a single value. So in my plot, I'm comparing the distribution of stochastic data (solid lines), and the mean of stochastic data (dots) with the deterministic values (dashed lines). For both the stochastic and deterministic datasets, there are two 'cases' (a) and (b). I would like all (a) and (b) data to share the same color.

This seems like it should be easy with aes and color/linetype/geom mappings, but I can't figure it out.

Thanks in advance.

Thermobarometer answered 10/6, 2013 at 21:57 Comment(1)
So in the chart above, you want d.a and s.a to be the same colour and d.b and s.b to be the same colour?Spectroradiometer
P
5

To get better legend place color=variable and linetype=variable inside aes() for the ggplot() and for geom_vline() - so there will be one legend. Then for geom_point() place x and y inside aes() as well as color="s.mean" and linetype="s.mean". This will ensure that new level is added to legend. Now with scale_color"manual() and scale_linetype_manual() you can set desired colors and linetypes. With guides() and override.aes= you can remove points from first four entries.

ggplot(sdata.m, aes(x=value, color=variable,linetype=variable))+
  stat_ecdf(size=1)+
  geom_vline(data=ddata.m,
             aes(xintercept = value,color=variable,linetype=variable),
             size=2) +
  geom_point(aes(x=mean(sdata.m$value[sdata.m$variable=="s.a"]),
       color="s.mean",linetype="s.mean",y=.5),size = 5) +
  geom_point(aes(x=mean(sdata.m$value[sdata.m$variable=="s.b"]),
        color="s.mean",linetype="s.mean",y=.5),size = 5)+
  scale_color_manual(breaks=c("d.a","d.b","s.a","s.b","s.mean"),
                     values=c("blue","blue","red","red","green"))+
  scale_linetype_manual(breaks=c("d.a","d.b","s.a","s.b","s.mean"),
                     values=c(1,2,1,2,0))+
  guides(color=guide_legend(override.aes=list(shape=c(NA,NA,NA,NA,16))))

enter image description here

Posy answered 11/6, 2013 at 5:55 Comment(1)
Thanks Didzis! I posted a final solution below, with all of the details i was looking for. Thanks for the code AND the explanation.Thermobarometer
T
3

Didzis gets credit for the answer; I was able to adapt his code and get to the final product I was looking for:

ggplot(sdata.m, aes(x=value, color=variable,linetype=variable,shape=variable))+
  stat_ecdf(size=1)+
  geom_vline(data=ddata.m,
             aes(xintercept = value,color=variable,linetype=variable,shape=variable),
             size=2) +
  geom_point(aes(x=mean(sdata.m$value[sdata.m$variable=="s.a"]),
                 color="s.a.mean",linetype="s.a.mean",shape="s.a.mean",
                 y=.5),size = 5) +
  geom_point(aes(x=mean(sdata.m$value[sdata.m$variable=="s.b"]),
                 color="s.b.mean",linetype="s.b.mean",shape="s.b.mean",
                 y=.5),size = 5) +
  scale_shape_manual(breaks=c("d.a","d.b","s.a","s.a.mean","s.b","s.b.mean"),
                     values=c(16,16,16,16,16,16)) +
  scale_color_manual(breaks=c("d.a","d.b","s.a","s.a.mean","s.b","s.b.mean"),
                     values=c("blue","red","blue","blue","red","red"))+
  scale_linetype_manual(breaks=c("d.a","d.b","s.a","s.a.mean","s.b","s.b.mean"),
                        values=c(2,2,1,0,1,0))+
  guides(color=guide_legend(override.aes=list(shape=c(NA,NA,NA,16,NA,16))))

enter image description here A couple of things I learned:

  1. when adding the breaks/values in scale_manual, alphabetical order is important.
  2. when all parameters (linetype/shape/color) are mapped to the same thing 'variable', you can get everything in one legend
  3. when overriding things with manual scales, you need to make one of each scale, and then override with 'guides' if need be

Thanks again Didzis. Another life, saved.

Thermobarometer answered 12/6, 2013 at 15:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.