In R, how to generate a dataset consisting of the means of all column of a dataframe?
Asked Answered
E

4

6

I can generate 20 observations of a uniform distribution with the runif function : runif(n=20) and 100 replicates of the same distribution as following.

df <- replicate( 100, runif(n=20))

This creates df a matrix of dimensions [20,100] which I can convert into a data frame with 100 columns and 20 rows.

How can I generate a new data frame consisting of the means of each column of df ?

Thank you for your help.

Expressivity answered 25/4, 2012 at 19:51 Comment(1)
minor point: in R, they're functions, not commands!Misbecome
T
11

You can use colMeans.

data <- replicate(100, runif(n=20))
means <- colMeans(data)
Ton answered 25/4, 2012 at 19:56 Comment(1)
R 2.15+ also includes .colMeans(). According to the note, these are "for use in programming where ultimate speed is required."Whistler
C
5

Generate data:

data <- replicate(100, runif(n=20))

Means of columns, rows:

col_mean <- apply(data, 2, mean)
row_mean <- apply(data, 1, mean)

Standard deviation of columns, rows

col_sd   <- apply(data, 2, sd)
row_sd   <- apply(data, 1, sd)
Ceolaceorl answered 25/4, 2012 at 20:8 Comment(3)
colMeans, rowMeans, colSums, and rowSums will generally perform faster than their apply equivalents, though for most cases, the performance hit will not be a huge deal (obviously depends on the size of your data...).Indevout
check out the help page for ?colMeans for details, but essentially those functions are "written for speed" and do less error checking than the apply functions. I wish I understood the details better myself...Indevout
On a 10000 x 10000 matrix colMeans took ~0.1s, apply ~3.2s.Ton
O
2

if i understand correctly: apply(replicate(100,runif(n=20)),2,mean)

Ousley answered 25/4, 2012 at 19:54 Comment(1)
Dear frankc: Thank you very much for your help- I tried your suggestion and it indeed worked like a charm.Expressivity
I
2

Building off of Nico's answer, you could instead make one call to runif(), format it into a matrix, and then take the colMeans of that. It proves faster and is equivalent to the other answers.

library(rbenchmark)
#reasonably fast
f1 <- function() colMeans(replicate(100,runif(20)))
#faster yet
f2 <- function() colMeans(matrix(runif(20*100), ncol = 100))

benchmark(f1(), f2(), 
          order = "elapsed", 
          columns = c("test", "elapsed", "relative"),
          replications=10000)

#Test results
  test elapsed relative
2 f2()    0.91 1.000000
1 f1()    5.10 5.604396
Indevout answered 25/4, 2012 at 22:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.