assign to is.na(clinical.trial$age)

Asked 14/6, 2017 at 12:13 Answered 28/10, 2017 at 21:36

I am looking at the code from here which has this at the beginning:

## generate data for medical example 
clinical.trial <-
    data.frame(patient = 1:100,
               age = rnorm(100, mean = 60, sd = 6),
               treatment = gl(2, 50,
                 labels = c("Treatment", "Control")),
               center = sample(paste("Center", LETTERS[1:5]), 100, replace = 
TRUE))

## set some ages to NA (missing) 
is.na(clinical.trial$age) <- sample(1:100, 20)

I cannot understand this last line. The LHS is a vector of all FALSE values. The RHS is a vector of 20 numbers selected from the vector 1:100. I don't understand this kind of assignment. How is this result in clinical.trial$age getting some NA values? Does this kind of assignment have a name? At best I would say that the boolean vector on the RHS gets numbers assigned to it with recycling.

Outsole answered 14/6, 2017 at 12:13 Comment(5)

The LHS is a vector of all FALSE values (since no NA is present). – Penetralia 14/6, 2017 at 12:20

Interesting! so If x <- 1:3 then is.na(x) <- 2 seems like we are solving the x[2] <- NA with respect of 2 – Catholicize 14/6, 2017 at 12:21

The is.na<- behavior is described in the respective help. But I agree that this usage is far from "intuitive"... – Cyanide 14/6, 2017 at 12:21

Is anyone aware if there are more functions including that functionality (colnames(), class(),....)? I would be interested in understanding why this is done (apparently only in some functions(?), but not all) instead of restating the existing docu,.... – Ebner 14/6, 2017 at 12:32

What do you mean by "that functionality"? An assignment variant of a function? It's just convenient. Most important example is `[<-` (subset assignment). – Nailbrush 14/6, 2017 at 12:40

is.na(x) <- value is translated as 'is.na<-'(x, value).

You can think of 'is.na<-'(x, value) as 'assign NA to x, at position value'.

A perhaps better and intuitive phrasing could be assign_NA(to = x, pos = value).

Regarding other similar function, we can find those in the base package:

x <- as.character(lsf.str("package:base"))
x[grep('<-', x)]
#>  [1] "$<-"                     "$<-.data.frame"         
#>  [3] "@<-"                     "[[<-"                   
#>  [5] "[[<-.data.frame"         "[[<-.factor"            
#>  [7] "[[<-.numeric_version"    "[<-"                    
#>  [9] "[<-.data.frame"          "[<-.Date"               
#> [11] "[<-.factor"              "[<-.numeric_version"    
#> [13] "[<-.POSIXct"             "[<-.POSIXlt"            
#> [15] "<-"                      "<<-"                    
#> [17] "attr<-"                  "attributes<-"           
#> [19] "body<-"                  "class<-"                
#> [21] "colnames<-"              "comment<-"              
#> [23] "diag<-"                  "dim<-"                  
#> [25] "dimnames<-"              "dimnames<-.data.frame"  
#> [27] "Encoding<-"              "environment<-"          
#> [29] "formals<-"               "is.na<-"                
#> [31] "is.na<-.default"         "is.na<-.factor"         
#> [33] "is.na<-.numeric_version" "length<-"               
#> [35] "length<-.factor"         "levels<-"               
#> [37] "levels<-.factor"         "mode<-"                 
#> [39] "mostattributes<-"        "names<-"                
#> [41] "names<-.POSIXlt"         "oldClass<-"             
#> [43] "parent.env<-"            "regmatches<-"           
#> [45] "row.names<-"             "row.names<-.data.frame" 
#> [47] "row.names<-.default"     "rownames<-"             
#> [49] "split<-"                 "split<-.data.frame"     
#> [51] "split<-.default"         "storage.mode<-"         
#> [53] "substr<-"                "substring<-"            
#> [55] "units<-"                 "units<-.difftime"

All works the same in the sense that 'fun<-'(x, val) is equivalent to fun(x) <- val. But after that they all behave like any normal functions.

R manuals: 3.4.4 Subset assignment

Glomma answered 14/6, 2017 at 12:24 Comment(5)

See also cran.r-project.org/doc/manuals/r-release/…. The language definition is worth reading it. – Nailbrush 14/6, 2017 at 13:0

@GGamba, I don't see any function like assign_NA() so I guess you were just trying to explain the functionality. But I don't get the original code. Is there an alternative way to do this that is not cryptic and just uses basic simple R? It seems strange to put this code in a tutorial on a basic function like table() – Outsole 14/6, 2017 at 13:25

@Outsole Please follow the link I've provided in my comment above. This is "basic simple R". You are using this every time when you do something like x[1] <- 0. – Nailbrush 14/6, 2017 at 13:34

@Roland, in x[1]<-0 , a value is being replaced. In is.na(clinical.trial$age) <- sample(1:100, 20), the LHS is not the age variable. The LHS is a boolean vector. So that is being replaced with numbers. I don't see how this will affect the age variable. – Outsole 14/6, 2017 at 14:29

I'm talking about syntax here not about what these functions do. They have only assignment in common. The language definition clearly explains how this syntax is interpreted by the parser. Other than the syntax, which is a core part of the language as I tried to explain with the example [<-, there is nothing special here. You can easily define your own fun<-. – Nailbrush 15/6, 2017 at 4:58

The help tells us, that:

(xx <- c(0:4)) 
is.na(xx) <- c(2, 4)
xx                     #> 0 NA  2 NA  4

So,

is.na(xx) <- 1

behaves more like

set NA at position 1 on variable xx

Crural answered 14/6, 2017 at 12:27 Comment(1)

I can see (xx <- c(0:4)) is.na(xx) <- c(2, 4) in the help but I have no idea why this works or what this kind of assignment is called. – Outsole 14/6, 2017 at 12:37

@matt, to respond to your question asked above in the comments, here's an alternative way to do the same assignment that I think is easier to follow :-)

clinical.trial$age[sample(1:100, 20)] <- NA

Downtrodden answered 28/10, 2017 at 21:36 Comment(0)

Recommended topics

Hot tags