Boolean addition in R data frame produces a boolean instead of an integer
Asked Answered
L

2

10

If I try to create a new column in an R dataframe by adding 3 boolean expressions in one step, it results in a boolean rather than an integer. If I use an intermediate step to first create columns for the 3 boolean expressions, I can add them up and get an integer. I don't understand why the two sets of code produce different results.

#The input is a dataframe with 3 variables that are sometimes missing
#and sometimes not.
subjid <- c(101,102,103,104,105,106,107,108)
var1 <- c(1,2,3,4,NaN,NaN,NaN,NaN)
var2 <- c(1,2,NaN,NaN,5,6,NaN,NaN)
var3 <- c(1,NaN,3,NaN,5,NaN,7,NaN)
df <- data.frame(subjid, var1, var2, var3)
df
subjid var1 var2 var3
1    101    1    1    1
2    102    2    2  NaN
3    103    3  NaN    3
4    104    4  NaN  NaN
5    105  NaN    5    5
6    106  NaN    6  NaN
7    107  NaN  NaN    7
8    108  NaN  NaN  NaN
#This code was intended to count how many of the 3 variables were nonmissing
#But it produces an unexpected result
df$nonmissing_count_a <- !is.na(df$var1) + !is.na(df$var2) + !is.na(df$var3)
table(df$nonmissing_count_a)
FALSE  TRUE
5     3
#This code is intended to obtain the same count of nonmissing variables
#And it works as expected
df$var1_nonmissing <- !is.na(df$var1)
df$var2_nonmissing <- !is.na(df$var2)
df$var3_nonmissing <- !is.na(df$var3)
df$nonmissing_count_b <- df$var1_nonmissing + df$var2_nonmissing + df$var3_nonmissing
table(df$nonmissing_count_b)
0 1 2 3
1 3 3 1
Lancinate answered 4/5, 2024 at 17:21 Comment(0)
G
12

It happens because of operator precedence (see ?Syntax), try

table((!is.na(df$var1)) + (!is.na(df$var2)) + (!is.na(df$var3)))

0 1 2 3 
1 3 3 1

The addition + has higher precedence than negation !

Keep in mind that you're actually expecting output from counting or adding 1s and 0s (numeric)

table(as.numeric(!is.na(df$var1)) + 
      as.numeric(!is.na(df$var2)) + 
      as.numeric(!is.na(df$var3)))

0 1 2 3 
1 3 3 1

Alternatively try rowSums

table(rowSums(!is.na(df[,-1])))

0 1 2 3 
1 3 3 1
Gramicidin answered 4/5, 2024 at 18:0 Comment(0)
S
9

To complement Andre's answer (the one to accept!), the order of precedence for an expression may be checked with lobstr::ast:

(The tree in the second example shows the addition coming last at the top.)

# version in the question
lobstr::ast(!is.na(df$var1) + !is.na(df$var2) + !is.na(df$var3))
#> █─`!` 
#> └─█─`+` 
#>   ├─█─is.na 
#>   │ └─█─`$` 
#>   │   ├─df 
#>   │   └─var1 
#>   └─█─`!` 
#>     └─█─`+` 
#>       ├─█─is.na 
#>       │ └─█─`$` 
#>       │   ├─df 
#>       │   └─var2 
#>       └─█─`!` 
#>         └─█─is.na 
#>           └─█─`$` 
#>             ├─df 
#>             └─var3

# Andre's answer
lobstr::ast((!is.na(df$var1)) + (!is.na(df$var2)) + (!is.na(df$var3)))
#> █─`+` 
#> ├─█─`+` 
#> │ ├─█─`(` 
#> │ │ └─█─`!` 
#> │ │   └─█─is.na 
#> │ │     └─█─`$` 
#> │ │       ├─df 
#> │ │       └─var1 
#> │ └─█─`(` 
#> │   └─█─`!` 
#> │     └─█─is.na 
#> │       └─█─`$` 
#> │         ├─df 
#> │         └─var2 
#> └─█─`(` 
#>   └─█─`!` 
#>     └─█─is.na 
#>       └─█─`$` 
#>         ├─df 
#>         └─var3

Created on 2024-05-04 with reprex v2.1.0

Selfsame answered 4/5, 2024 at 18:42 Comment(1)
Thank you both. I had numerous workarounds, but I asked because I knew there must be some fundamental concept in R that I was missing that would cause me pain later. I didn't realize that "!" sat where it does in the order of operations.Lancinate

© 2022 - 2025 — McMap. All rights reserved.