Summing lots of Vectors; row-wise or elementwise, but ignoring NA values
Asked Answered
K

3

20

I am trying to create a new vector that is the sum of 35 other vectors. The problem is that there are lots of NA values, but for this particular use, I want to treat those as zeros. Adding the vectors won't work, because if any of the 35 vectors contain an NA, the result is NA. Here is the example of the problem:

col1<-c(NA,1,2,3)
col2<-c(1,2,3,NA)
col3<-c(NA,NA,2,3)
Sum<-col1+col2+col3
Sum
# [1] NA NA  7 NA

I want the result to be 1, 3, 7, 6.
I suppose I could create new versions of each of the vectors in which I replace the NA with a 0, but that would be a lot of work when applied to 35 vectors. Is there a simple function that will help me out?

Karrikarrie answered 20/11, 2013 at 19:22 Comment(0)
D
36

Could also have used the rowSums function:

rowSums( cbind (col1,col2,col3), na.rm=TRUE)
#[1] 1 3 7 6

?rowSums   # also has colSums described on same help page
Dachy answered 20/11, 2013 at 19:49 Comment(2)
Nice! That way I don't have to tell Apply that the function applies to a row.Karrikarrie
It's also a lot faster if being used with big data.Dachy
I
7

Put them in a matrix first:

apply(cbind(col1,col2,col3),1,sum,na.rm = TRUE)
[1] 1 3 7 6

You can read about each function here using R's built-in documentation: ?apply, ?cbind.

cbind stands for "column bind": it takes several vectors or arrays and binds them "by column" into a single array:

cbind(col1,col2,col3)
     col1 col2 col3
[1,]   NA    1   NA
[2,]    1    2   NA
[3,]    2    3    2
[4,]    3   NA    3

apply, well, applies a function (sum in this case) to either the rows or columns of a matrix. This allows us to use the na.rm = TRUE argument to sum so that the NA values are dropped.

Improvident answered 20/11, 2013 at 19:25 Comment(3)
This works! I seem to have asked a fairly basic question, but if you could add a link or two that will help explain why this code works, I would be grateful.Karrikarrie
What does the 1 represent in the apply? I mean the 1 that appears between the list of columns and the sum function.Karrikarrie
Aha! Rows, not columns. (By the way, I did try reading that article before asking, but I was having trouble figuring it out)Karrikarrie
D
0

For a tidyverse answer I would say that you have to turn the function sum() that is usually a summary function into a vectorized function using rowwise(). This will allow you to transform sum into a multiple input operator to which you can pass the na.rm = TRUE parameter as follows:

t <- tibble(col1, col2, col3)
t %>% rowwise() %>% mutate(sum = sum(col1, col2, col3, na.rm = TRUE)) 

or if you prefer without the pipes

t2 <- rowwise(t)
t2 <- mutate(t2, sum = sum(col1, col2, col3, na.rm = TRUE))

to extract that last column of the table you can do

select(t2, sum)

or if you want it as a vector

pull(t2, sum)
Delciedelcina answered 22/11, 2023 at 0:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.