Calculating standard deviation of each row
Asked Answered
A

2

15

I am trying to use rowSds()to calculate each rows standard deviation so that I can pick the rows that have high sds to graph.

My data frame is called xx is like this:

head(xx,1)
     Job     variable 2012-02-23 2012-02-24 2012-02-25 2012-02-27 2012-02-28 2012-02-29 2012-03-01 2012-03-02 2012-03-03 2012-03-05 2012-03-06 2012-03-07 2012-03-08 2012-03-09 2012-03-10 2012-03-12 2012-03-13 2012-03-14
1 A Duration        152        424         NA        499        320        117        211        363         NA        605         76        309        204        185         NA         25        733        500
  2012-03-15 2012-03-16 2012-03-17 2012-03-19 2012-03-20 2012-03-21 2012-03-22 2012-03-23 2012-03-24 2012-03-26 2012-03-27 2012-03-28 2012-03-29 2012-03-30 2012-03-31 2012-04-02 2012-04-03 2012-04-04 2012-04-05 2012-04-06
1        521        601         NA        229        758        421        334        659         NA        419        423        444        289        594         NA        327        533        183        211        235
  2012-04-07 2012-04-09 2012-04-10 2012-04-11 2012-04-12 2012-04-13 2012-04-14 2012-04-16 2012-04-17 2012-04-18 2012-04-19 2012-04-20 2012-04-21 2012-04-23 2012-04-24 2012-04-25 2012-04-26 2012-04-27 2012-04-28 2012-04-30
1         NA        225        419        236        218        188         NA        205        547        153        196        200         NA        259        257        208        302        244         NA        806
  2012-05-01 2012-05-02 2012-05-03 2012-05-04 2012-05-05 2012-05-07 2012-05-08 2012-05-09 2012-05-10 2012-05-11 2012-05-12 2012-05-14 2012-05-15 2012-05-16 2012-05-17 2012-05-18 2012-05-19 2012-05-21 2012-05-22 2012-05-23
1        402        492       1078        440         NA        382        576       1105        511        368         NA        360        381       1152        718        353         NA        408        413        935
  2012-05-24 2012-05-25 2012-05-26 2012-05-28 2012-05-29 2012-05-30 2012-05-31 2012-06-01 2012-06-02 2012-06-04 2012-06-05 2012-06-06 2012-06-07 2012-06-08 2012-06-09 2012-06-11 2012-06-12 2012-06-13 2012-06-14 2012-06-15
1        306        277         NA        253        367        977        557        432         NA        328        521        467        972       1556         NA        386       1394        401        857        857
  2012-06-16 2012-06-18 2012-06-19 2012-06-20 2012-06-21 2012-06-22 2012-06-23 2012-06-25 2012-06-26 2012-06-27 2012-06-28 2012-06-29 2012-06-30 2012-07-02 2012-07-03 2012-07-04 2012-07-05 2012-07-06 2012-07-07 2012-07-09
1         NA       1056        324        329        327        325         NA        341        268        231        245        301         NA        283        365        297        310        260         NA        254
  2012-07-10 2012-07-11 2012-07-12 2012-07-13 2012-07-14 2012-07-16 2012-07-17 2012-07-18 2012-07-19 2012-07-20 2012-07-21 2012-07-23 2012-07-24 2012-07-25 2012-07-26 2012-07-27 2012-07-28 2012-07-30 2012-07-31 2012-08-01
1        283        395        273        273         NA        278        243        210        356        267         NA        442        483        271        327        271         NA        716        598        577
  2012-08-02 2012-08-03 2012-08-06 2012-08-07 2012-08-08 2012-08-09 2012-08-10 2012-08-13 2012-08-14 2012-08-15 2012-08-16 2012-08-17 2012-08-20 2012-08-21 2012-08-22 2012-08-23 2012-08-24 2012-08-27 2012-08-28 2012-08-29
1        345        403        318        522        333        259        404        244        240        288        245         22        738        530        390        648        294        403        381        724
  2012-08-30 2012-08-31 2012-09-03 2012-09-04 2012-09-05 2012-09-06 2012-09-07 2012-09-10 2012-09-11 2012-09-12 2012-09-13 2012-09-14 2012-09-17 2012-09-18 2012-09-19 2012-09-20 2012-09-21 2012-09-24 2012-09-25 2012-09-26
1        740        575        558        785        883        501        901        500        285        174        562       1047        603        990        289        173        253        512        236        278
  2012-09-27 2012-09-28 2012-10-01 2012-10-02 2012-10-03 2012-10-04 2012-10-05 2012-10-08 2012-10-09 2012-10-10 2012-10-11 1        173        277        217        291        197        308        124        387        369        250        242

I am trying to calculate each rows standard deviation and assinging to sd column name:

xx$sd<-rowSds(xx)

I get this error:

Error in apply(na.omit(as.matrix(x), ...), 1, FUN, ...) : 
  error in evaluating the argument 'X' in selecting a method for function 'apply': Error in na.omit(as.matrix(x), ...) : 
  error in evaluating the argument 'object' in selecting a method for function 'na.omit': Error in `colnames<-`(`*tmp*`, value = c("2012-02-23", "2012-02-24", "2012-02-25",  : 
  length of 'dimnames' [2] not equal to array extent

Any ideas how can I omit NA when calculating the SD? Is my syntax correct?

Avertin answered 12/10, 2012 at 14:55 Comment(0)
C
36

You can use apply and transform functions

set.seed(007)
X <- data.frame(matrix(sample(c(10:20, NA), 100, replace=TRUE), ncol=10))
transform(X, SD=apply(X,1, sd, na.rm = TRUE))
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10       SD
1  NA 12 17 18 19 16 12 13 20  14 3.041381
2  14 12 13 13 14 18 16 17 20  10 3.020302
3  11 19 NA 12 19 19 19 20 12  20 3.865805
4  10 11 20 12 15 17 18 17 18  12 3.496029
5  12 15 NA 14 20 18 16 11 14  18 2.958040
6  19 11 10 20 13 14 17 16 10  16 3.596294
7  14 16 17 15 10 11 15 15 11  16 2.449490
8  NA 10 15 19 19 12 15 15 19  14 3.201562
9  11 NA NA 20 20 14 14 17 14  19 3.356763
10 15 13 14 15 NA 13 15 NA 15  12 1.195229

From ?apply you can see ... which allows using optional arguments to FUN, in this case you can use na.rm=TRUE to omit NA values.

Using rowSds from matrixStats package also requires setting na.rm=TRUE to omit NA

library(matrixStats)
transform(X, SD=rowSds(X, na.rm=TRUE)) # same result as before.
Caravan answered 12/10, 2012 at 15:4 Comment(4)
@Jilber, what if first two columns are text. I just applied this function, and getting that x must be numeric error.Avertin
sd is only for numeric vectors. If the first two col are text you should omit them by doing your.data.frame[,-c(1,2)] and passing this to either rowSds or apply ;)Caravan
FYI, since matrixStats (>= 2.12.0) data.frame:s are not supported - only matrices. Thus, you need to do rowSds(as.matrix(X), na.rm=TRUE) for the above to work.Upsilon
For a base R approach that is much faster than calling apply see my answer here.Tenace
C
1

Also works, based on this answer

set.seed(007)
X <- data.frame(matrix(sample(c(10:20, NA), 100, replace=TRUE), ncol=10))

vars_to_sum = grep("X", names(X), value=T)
X %>% 
  group_by(row_number()) %>%
  do(data.frame(., 
                SD = sd(unlist(.[vars_to_sum]), na.rm=T)))

...which appends a couple of row number columns, so probably better to explicitly add your row IDs for grouping.

X %>% 
  mutate(ID = row_number()) %>%
  group_by(ID) %>%
  do(data.frame(., SD = sd(unlist(.[vars_to_sum]), na.rm=T)))

This syntax also has the feature of being able to specify which columns you want to use.

Calciferol answered 2/4, 2020 at 23:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.