How to omit NA values while pasting numerous column values together?
Asked Answered
K

2

6

I have a data frame dd2 with hundreds of columns and what I need to do is paste all these column values together omitting any NA values. If I do something like this

apply(dd2, 1, paste, collapse=",")

it actually includes NAs as "NA" string. I want to avoid that. I could also do as shown below, but this would expect me to work for each individual column at a time to get the result.

result <- cbind(    
    dd2, 
    combination = paste(dd2[,2], replace(dd2[,3], is.na(dd2[,3]), ""), sep = ",")
)

Is there any efficient way to do it? Here is the sample data:

dd2 <- structure(c("A", "B", "C", "D", "E", "AK2", "HFM1", NA, "TRR", 
"RTT", NA, "PPT", "TRR", "RTT", NA, "PPT", NA, NA, "GGT", NA), .Dim = c(5L, 
4L), .Dimnames = list(NULL, c("sample_id", "plant", "animal", 
"more")))
Kaleena answered 23/12, 2015 at 4:59 Comment(3)
Your sample data is not a data.frame. With "dd2", are you just trying to paste together all of the columns except the first?Dayna
@AnandaMahto I want to paste select column values together omitting any NAs. For example, I want to paste dd2[,wanted.columns] .Kaleena
Maybe something like this would work too: melt(as.data.table(dd2), measure.vars = c("plant", "animal"), na.rm = TRUE)[, toString(value), by = .(sample_id, more)].Dayna
K
7

You could try na.omit() to omit the values, then paste. Also, you could use toString(), as it is the equivalent of paste(..., collapse = ", ").

apply(dd2, 1, function(x) toString(na.omit(x)))
# [1] "A, AK2, PPT"      "B, HFM1, PPT"     "C, TRR"          
# [4] "D, TRR, RTT, GGT" "E, RTT"   

If you have specific columns you are using then

apply(dd2[, cols], 1, function(x) toString(na.omit(x)))
Kuehl answered 23/12, 2015 at 5:11 Comment(5)
Is there a way to specify a different separator? (ex. "" instead of ,)?Kenn
@Kenn - not in toString. You would have to use paste.Kuehl
Thank you @RichScriven. The problem with paste is the NA cells. I could use na.omit as well, but then I get problems with the different number of rows. It should have an option to just ignore the NA cells.Kenn
stringr::str_replace_na(c(NA, "abc", "def"), replacement=''). Generally, refer to stringr for anything more advanced than simple concatenations.Outbrave
@Outbrave - Okay, but there is a little more to it than that. stringr::str_replace_na(dd2, replacement='') does not produce the desired result. Base R's replace could be used to do exactly the same thing.Kuehl
C
3

dd2 is a matrix in which case using apply as suggested by @Rich Scriven is more appropriate. If it is a dataframe you can use tidyr::unite

dd2 <- data.frame(dd2)
tidyr::unite(dd2, result, plant, animal, more, na.rm = TRUE, sep = ',')

#  sample_id      result
#1         A     AK2,PPT
#2         B    HFM1,PPT
#3         C         TRR
#4         D TRR,RTT,GGT
#5         E         RTT

To combine all the columns you can use everything().

tidyr::unite(dd2, result, dplyr::everything(), na.rm = TRUE, sep = ',')

#         result
#1     A,AK2,PPT
#2    B,HFM1,PPT
#3         C,TRR
#4 D,TRR,RTT,GGT
#5         E,RTT
Constitution answered 28/1, 2021 at 3:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.