I like to write a function using ddply
that outputs the summary statistics based on the name of two columns of data.frame
mat
.
mat
is a bigdata.frame
with the name of columns"metric", "length", "species", "tree", ...,"index"
index
is factor with 2 levels"Short", "Long"
"metric", "length", "species", "tree"
and others are all continuous variables
Function:
summary1 <- function(arg1,arg2) {
...
ss <- ddply(mat, .(index), function(X) data.frame(
arg1 = as.list(summary(X$arg1)),
arg2 = as.list(summary(X$arg2)),
.parallel = FALSE)
ss
}
I expect the output to look like this after calling summary1("metric","length")
Short metric.Min. metric.1st.Qu. metric.Median metric.Mean metric.3rd.Qu. metric.Max. length.Min. length.1st.Qu. length
.Median length.Mean length.3rd.Qu. length.Max.
....
Long metric.Min. metric.1st.Qu. metric.Median metric.Mean metric.3rd.Qu. metric.Max. length.Min. length.1st.Qu. length
.Median length.Mean length.3rd.Qu. length.Max.
....
At the moment the function does not produce the desired output? What modification should be made here?
Thanks for your help.
Here is a toy example
mat <- data.frame(
metric = rpois(10,10), length = rpois(10,10), species = rpois(10,10),
tree = rpois(10,10), index = c(rep("Short",5),rep("Long",5))
)
dput
). – Priestermat<-data.frame(metric=rpois(10,10),length=rpois(10,10),species=rpois(10,10),tree=rpois(10,10),index=c(rep("Short",5),rep("Long",5)))
- Thanks – Nerinedata.frame
and the variable to split by as well. That way your function will work when you need to use it on a data.frame namedMat
orMAT
orMyOtherData
, etc. – Malraux