How to draw multiple CDF plots of vectors with different number of rows
Asked Answered
D

2

6

I want to draw the CDF plot of multiple variables in the same graph. The length of the variables are different. To simplify the detail, I use the following example code:

library("ggplot2")

a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)

df <- data.frame(x = c(a1, a2, a3),ggg = gl(3, 1000))
ggplot(df, aes(x, colour = ggg)) + stat_ecdf()+ coord_cartesian(xlim = c(0, 3)) + scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))

As we can see, the a3 is 800 length, which is different with a1, a2. When I run the code, it shows:

> df <- data.frame(x = c(a1, a2, a3),ggg = gl(3, 1000))
Error in data.frame(x = c(a1, a2, a3), ggg = gl(3, 1000)) : 
arguments imply differing number of rows: 2800, 3000
> ggplot(df, aes(x, colour = ggg)) + stat_ecdf()+ coord_cartesian(xlim = c(0, 3)) +    scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))
Error: ggplot2 doesn't know how to deal with data of class function

So, how can I draw the cdf plots of different variables that is not the same length in the same graph using ggplot2? Looking forward for helps!

Departmentalism answered 17/5, 2014 at 17:14 Comment(0)
R
6

ggplot has no trouble at all dealing with different counts in each group. The problem is with your creation of the factor ggg. Use this:

library(ggplot2)

a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)

df <- data.frame(x = c(a1, a2, a3), ggg=factor(rep(1:3, c(1000,1000,800))))
ggplot(df, aes(x, colour = ggg)) + 
  stat_ecdf()+
  scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))

Also, the way you have it set up, setting xlim=c(0,3), draws the cdf on [0,3], which as you can see in the plot above is more or less a straight line.

Rodent answered 18/5, 2014 at 0:15 Comment(1)
Brilliant answer!Tobitobiah
C
4

You're right in that ggplot sure does seem to want equal numbers of counts in each group. So rather than useing stat_ecdf, perhaps you could just do the calculation yourself

library(ggplot2)

a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)

df <- data.frame(x = c(a1, a2, a3),ggg = factor(rep(1:3, c(1000,1000,800))))

df <- df[order(df$x), ]
df$ecdf <- ave(df$x, df$ggg, FUN=function(x) seq_along(x)/length(x))

ggplot(df, aes(x, ecdf, colour = ggg)) + geom_line() + scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))

Note that you were using gl() incorrectly; your code assumed all three groups had 1000 entries as well. Here i've changed it to rep() to get the right number of labels per group.

ecdf pggplot

Coon answered 17/5, 2014 at 19:16 Comment(2)
And how can I set different line types for a1, a2 and a3? Such as a1 is solid, a2 is dashed, a3 is dot?Departmentalism
@bangliu If you have a different question, it's best to start a new question rather than asking it in the comments of an existing question. Or you could search this site for other questions about changing the linetype with ggplot.Coon

© 2022 - 2024 — McMap. All rights reserved.