standard deviation on dataframe does not work
Asked Answered
A

1

13

I have an unexpected [for me at least] error in calculating a standard deviation. The idea [*] is to convert all missing values to 1 and 0 otherwise. Then extract variables that have some [but not all] missing values, before a correlation is done. That extraction step is attempted with a sd function, but it fails [why?].

library(VIM)
data(sleep) # dataset with missing values

x = as.data.frame(abs(is.na(sleep))) # converts all NA to 1, otherwise 0
y = x[which(sd(x) > 0)] # attempt to extract variables with missing values

Error in is.data.frame(x) : 
(list) object cannot be coerced to type 'double'

# convert to double    
z = as.data.frame(apply(x, 2, as.numeric))
y = z[which(sd(z) > 0)]

Error in is.data.frame(x) : 
(list) object cannot be coerced to type 'double'

[*] R in Action, Robert Kabacoff

Agglomeration answered 5/6, 2014 at 10:42 Comment(0)
S
19

sd on data.frames has been defunct since R-3.0.0:

> ## Build a db of all R news entries.
> db <- news()
> ## sd
> news(grepl("sd", Text), db=db)
Changes in version 3.0.3:

PACKAGE INSTALLATION

    o   The new field SysDataCompression in the DESCRIPTION file allows
        user control over the compression used for sysdata.rda objects in
        the lazy-load database.

Changes in version 3.0.0:

DEPRECATED AND DEFUNCT

    o   mean() for data frames and sd() for data frames and matrices are
        defunct.

Use sapply(x, sd) instead.

Schleswigholstein answered 5/6, 2014 at 10:45 Comment(5)
Thanks Joshua. These are pretty important functions and it breaks some of the code that I have. :-(.Agglomeration
@Henk: yeah, it caused problems for quite a few CRAN packages at the time.Schleswigholstein
@Agglomeration You can define your own mean.data.frame and sd.data.frame functions easily if you don't want to go through your legacy code and change it.Flambeau
Does anyone else notice that using sapply(x, sd) makes the code go much much slower? Is there any faster alternative to this method?Levigate
@Reilstein: much slower compared to what? Your comment really should be a new question, but make sure you create a reproducible example and include some benchmarks to show that it's slower compared to some other method.Schleswigholstein

© 2022 - 2024 — McMap. All rights reserved.