In R, how to generate a dataset consisting of the means of all column of a dataframe?

Asked 25/4, 2012 at 19:51 Answered 25/4, 2012 at 22:25

r dataset simulation distribution replicate

I can generate 20 observations of a uniform distribution with the runif function : runif(n=20) and 100 replicates of the same distribution as following.

df <- replicate( 100, runif(n=20))

This creates df a matrix of dimensions [20,100] which I can convert into a data frame with 100 columns and 20 rows.

How can I generate a new data frame consisting of the means of each column of df ?

Thank you for your help.

Expressivity answered 25/4, 2012 at 19:51 Comment(1)

minor point: in R, they're functions, not commands! – Misbecome 26/4, 2012 at 7:29

You can use colMeans.

data <- replicate(100, runif(n=20))
means <- colMeans(data)

Ton answered 25/4, 2012 at 19:56 Comment(1)

R 2.15+ also includes .colMeans(). According to the note, these are "for use in programming where ultimate speed is required." – Whistler 25/4, 2012 at 20:54

Generate data:

data <- replicate(100, runif(n=20))

Means of columns, rows:

col_mean <- apply(data, 2, mean)
row_mean <- apply(data, 1, mean)

Standard deviation of columns, rows

col_sd   <- apply(data, 2, sd)
row_sd   <- apply(data, 1, sd)

Ceolaceorl answered 25/4, 2012 at 20:8 Comment(3)

colMeans, rowMeans, colSums, and rowSums will generally perform faster than their apply equivalents, though for most cases, the performance hit will not be a huge deal (obviously depends on the size of your data...). – Indevout 25/4, 2012 at 20:11

check out the help page for ?colMeans for details, but essentially those functions are "written for speed" and do less error checking than the apply functions. I wish I understood the details better myself... – Indevout 25/4, 2012 at 20:17

On a 10000 x 10000 matrix colMeans took ~0.1s, apply ~3.2s. – Ton 25/4, 2012 at 20:27

if i understand correctly: apply(replicate(100,runif(n=20)),2,mean)

Ousley answered 25/4, 2012 at 19:54 Comment(1)

Dear frankc: Thank you very much for your help- I tried your suggestion and it indeed worked like a charm. – Expressivity 25/4, 2012 at 20:1

Building off of Nico's answer, you could instead make one call to runif(), format it into a matrix, and then take the colMeans of that. It proves faster and is equivalent to the other answers.

library(rbenchmark)
#reasonably fast
f1 <- function() colMeans(replicate(100,runif(20)))
#faster yet
f2 <- function() colMeans(matrix(runif(20*100), ncol = 100))

benchmark(f1(), f2(), 
          order = "elapsed", 
          columns = c("test", "elapsed", "relative"),
          replications=10000)

#Test results
  test elapsed relative
2 f2()    0.91 1.000000
1 f1()    5.10 5.604396

Indevout answered 25/4, 2012 at 22:25 Comment(0)

Generate data:

Means of columns, rows:

Standard deviation of columns, rows

Recommended topics

Hot tags