Get data associated to ggplot + stat_ecdf()
Asked Answered
C

2

6

I like the stat_ecdf() feature part of ggplot2 package, which I find quite useful to explore a data series. However this is only visual, and I wonder if it is feasible - and if yes how - to get the associated table?

Please have a look to the following reproducible example

p <- ggplot(iris, aes_string(x = "Sepal.Length")) + stat_ecdf() # building of the cumulated chart 
p
attributes(p) # chart attributes
p$data # data is iris dataset, not the serie used for displaying the chart

enter image description here

Cluster answered 22/5, 2015 at 12:28 Comment(1)
Take a look at the ecdf function included in base R if you just want to estimate the empirical cdf without plotting it.Filia
K
2

We can recreate the data:

#Recreate ecdf data
dat_ecdf <- 
  data.frame(x=unique(iris$Sepal.Length),
             y=ecdf(iris$Sepal.Length)(unique(iris$Sepal.Length))*length(iris$Sepal.Length))
#rescale y to 0,1 range
dat_ecdf$y <- 
  scale(dat_ecdf$y,center=min(dat_ecdf$y),scale=diff(range(dat_ecdf$y)))

Below 2 plots should look the same:

#plot using new data
ggplot(dat_ecdf,aes(x,y)) +
  geom_step() +
  xlim(4,8)

#plot with built-in stat_ecdf
ggplot(iris, aes_string(x = "Sepal.Length")) +
  stat_ecdf() +
  xlim(4,8)
Ketch answered 22/5, 2015 at 12:59 Comment(0)
H
5

As @krfurlong showed me in this question, the layer_data function in ggplot2 can get you exactly what you're looking for without the need to recreate the data.

p <- ggplot(iris, aes_string(x = "Sepal.Length")) + stat_ecdf()
p.data <- layer_data(p)

The first column in p.data, "y", contains the ecdf values. "x" is the Sepal.Length values on the x-axis in your plot.

Halt answered 11/2, 2020 at 18:32 Comment(0)
K
2

We can recreate the data:

#Recreate ecdf data
dat_ecdf <- 
  data.frame(x=unique(iris$Sepal.Length),
             y=ecdf(iris$Sepal.Length)(unique(iris$Sepal.Length))*length(iris$Sepal.Length))
#rescale y to 0,1 range
dat_ecdf$y <- 
  scale(dat_ecdf$y,center=min(dat_ecdf$y),scale=diff(range(dat_ecdf$y)))

Below 2 plots should look the same:

#plot using new data
ggplot(dat_ecdf,aes(x,y)) +
  geom_step() +
  xlim(4,8)

#plot with built-in stat_ecdf
ggplot(iris, aes_string(x = "Sepal.Length")) +
  stat_ecdf() +
  xlim(4,8)
Ketch answered 22/5, 2015 at 12:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.