Is there such "colsd" in R?

Asked 9/7, 2013 at 13:42 Answered 3/11, 2019 at 15:44

I am using in my code colSums but I also need the standard deviation beside the sum. I searched in the internet and found this page which contain only:

colSums
colMeans

http://stat.ethz.ch/R-manual/R-devel/library/base/html/colSums.html

I tried this:

colSd

but I got this error:

Error: could not find function "colSd"

How I can do the same thing but for standard deviation:

colSd

Here is the code:

results <- colSums(x,na.rm=TRUE)#### here I want colsd

Foret answered 9/7, 2013 at 13:42 Comment(4)

No, but if you have a data.frame, try sapply(x, sd) or more general, apply(x, 2, sd). – Gratification 9/7, 2013 at 13:43

stat.ethz.ch/pipermail/r-help/2002-March/019606.html – Cranial 9/7, 2013 at 13:44

I also like the numcolwise function from the plyr package for this type of thing. – Fite 9/7, 2013 at 13:44

Yes, it'd be nice to have such functions. colMeans and colSums are much faster than apply(X, 2, ...) counterparts... – Valval 9/7, 2013 at 13:48

I want to provide a fourth (very similar to @Thomas) approach and some benchmarking:

library("microbenchmark")
library("matrixStats")

colSdApply <- function(x, ...)apply(X=x, MARGIN=2, FUN=sd, ...)
colSdMatrixStats <- colSds

colSdColMeans <- function(x, na.rm=TRUE) {
  if (na.rm) {
    n <- colSums(!is.na(x)) # thanks @flodel
  } else {
    n <- nrow(x)
  }
  colVar <- colMeans(x*x, na.rm=na.rm) - (colMeans(x, na.rm=na.rm))^2
  return(sqrt(colVar * n/(n-1)))
}

colSdThomas <- function(x)sqrt(rowMeans((t(x)-colMeans(x))^2)*((dim(x)[1])/(dim(x)[1]-1)))

m <- matrix(runif(1e7), nrow=1e3)

microbenchmark(colSdApply(m), colSdMatrixStats(m), colSdColMeans(m), colSdThomas(m))

# Unit: milliseconds
#                 expr      min       lq   median       uq      max neval
#        colSdApply(m) 435.7346 448.8673 456.6176 476.8373 512.9783   100
#  colSdMatrixStats(m) 344.6416 357.5439 383.8736 389.0258 465.5715   100
#     colSdColMeans(m) 124.2028 128.9016 132.9446 137.6254 172.6407   100
#       colSdThomas(m) 231.5567 240.3824 245.4072 274.6611 307.3806   100


all.equal(colSdApply(m), colSdMatrixStats(m))
# [1] TRUE
all.equal(colSdApply(m), colSdColMeans(m))
# [1] TRUE
all.equal(colSdApply(m), colSdThomas(m))
# [1] TRUE

Aracelyaraceous answered 9/7, 2013 at 15:0 Comment(5)

@sacvf: I missed the ... in colMeans (see my edit). Now colSdColMeans(x, na.rm=TRUE) should work. – Aracelyaraceous 9/7, 2013 at 15:22

I think to deal with NA you will have to use something like n <- colSums(!is.na(x)). – Amyloid 9/7, 2013 at 20:8

+1 for the benchmarking of different methods. This was useful and new for me. – Shanklin 9/7, 2013 at 21:10

@Amyloid thanks, you are right, I forgot that! (I update my code and the benchmark results.) – Aracelyaraceous 9/7, 2013 at 22:14

In the meantime the benchmark seems to be outdated and matrixStats::colSds 4-10x faster than the rest. – Mores 29/4, 2019 at 8:40

colSds and rowSds are two of many similar functions in the matrixStats package

Broadleaf answered 9/7, 2013 at 13:57 Comment(2)

Those functions in that package aren't actually doing anything fancy; they are just as slow as apply(x, 2, sd). – Sihonn 9/7, 2013 at 14:7

Not anymore. They are all written in C/C++ these days – Nadabas 10/5, 2018 at 6:41

This is the quickest and shortest way to calculate the standard deviation of the columns:

sqrt(diag(cov(data_matrix)))

Since the diagonal of a co-variance matrix consists of the variances of each variable, we do the following:

Calculate the co-variance matrix using cov
Extract the diagonal of the matrix using diag
Take the square root of the diagonal values using sqrt in order to get the standard deviation

I hope that helps :)

Sterol answered 23/1, 2018 at 5:38 Comment(0)

Use the following:

colSd <- function (x, na.rm=FALSE) apply(X=x, MARGIN=2, FUN=sd, na.rm=na.rm)

Shanklin answered 9/7, 2013 at 13:44 Comment(5)

I would do function(x, ...) apply(X = x, ...)). – Gratification 9/7, 2013 at 13:45

I debated that myself, but then sd uses only one optional, so I thought of including that directly. – Shanklin 9/7, 2013 at 13:46

@sacvf, it doesn't matter how many comments you pour in with "my results are NA, any idea why?", we can't help unless we see the data you're having a problem with. You should direct some of your energy into making your question reproducible. – Valval 9/7, 2013 at 14:21

I also tried the solution given by @Henry but I got the smae:Just NA – Foret 9/7, 2013 at 14:28

@savcf: I had disappeared for some time. You can't seriously ask something like 'what you meant if check na.rm'. You should really produce a reproducible example to get a correct solution. – Shanklin 9/7, 2013 at 21:8

I believe I have found a more elegant solution in diag(sqrt(var(data)))

This worked for me to get the standard deviation of each of my columns. However, it does compute a bunch of extra unnecessary covariances (and their square roots) along the way, so it isn't necessarily the most efficient approach. But if your data is small, it works excellently.

EDIT: I just realized that sqrt(diag(var(data))) is probably a bit more efficient, since it drops the unnecessary covariance terms earlier.

Rizal answered 13/4, 2014 at 23:34 Comment(0)

I don't know if these are particularly fast, but why not just use the formulae for SD:

x <- data.frame(y = rnorm(1000,0,1), z = rnorm(1000,2,3))

# If you have a population:
colsdpop <- function(x,...)
     sqrt(rowMeans((t(x)-colMeans(x,...))^2,...))
colsdpop(x)
sd(x$y); sd(x$z) # won't match `sd`

# If you have a sample:
colsdsamp <- function(x)
    sqrt( (rowMeans((t(x)-colMeans(x))^2)*((dim(x)[1])/(dim(x)[1]-1))) )
colsdsamp(x)
sd(x$y); sd(x$z) # will match `sd`

Note: the sample solution won't handle NAs well. One could incorporate something like apply(x,2,function(z) sum(!is.na(z))) into the right-most part of the formula to get an appropriate denominator, but it would get really murky quite quickly.

Queri answered 9/7, 2013 at 14:43 Comment(0)

I usually do column sd's with apply:

x <- data.frame(y = rnorm(20,0,1), z = rnorm(20,2,3))

> apply(x, 2, sd)
        y         z 
0.8022729 3.4700314

Verify:

> sd(x$y)
[1] 0.8022729

> sd(x$z)
[1] 3.470031

You can also do it with dplyr easily:

library(dplyr)
library(magrittr) # for pipes

> x %>% summarize_all(.,sd)
          y        z
1 0.8022729 3.470031

Wolfish answered 10/10, 2019 at 3:59 Comment(0)

You can just use apply function

all.sd <- apply(data, 2,sd)

Stlouis answered 3/11, 2019 at 15:44 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags