suppress NAs in paste()
Asked Answered
F

15

66

Regarding the bounty

Ben Bolker's paste2-solution produces a "" when the strings that are pasted contains NA's in the same position. Like this,

> paste2(c("a","b", "c", NA), c("A","B", NA, NA))
[1] "a, A" "b, B" "c"    ""

The fourth element is an "" instead of an NA Like this,

[1] "a, A" "b, B" "c"  NA     

I'm offering up this small bounty for anyone who can fix this.

Original question

I've read the help page ?paste, but I don't understand how to have R ignore NAs. I do the following,

foo <- LETTERS[1:4]
foo[4] <- NA
foo
[1] "A" "B" "C" NA
paste(1:4, foo, sep = ", ")

and get

[1] "1, A"  "2, B"  "3, C"  "4, NA"

What I would like to get,

[1] "1, A" "2, B" "3, C" "4"

I could do like this,

sub(', NA$', '', paste(1:4, foo, sep = ", "))
[1] "1, A" "2, B" "3, C" "4"

but that seems like a detour.

Fragrance answered 2/12, 2012 at 21:12 Comment(4)
if you have a recurrent need , you can implement your paste2(...,sep,collapse,na.rm=FALE) with na.rm argument for exemple.Appendectomy
@agstudy, how do I do that?Fragrance
stringr::str_replace_na(c(NA, "abc", "def"), replacement="") -- 2018 wayGrime
replacing the NA with an empty string won't work with paste. From paste(1:4, stringr::str_replace_na(foo, replacement=""), sep=", ") you get "1, A" "2, B" "3, C" "4, "Voter
T
48

For the purpose of a "true-NA": Seems the most direct route is just to modify the value returned by paste2 to be NA when the value is ""

 paste3 <- function(...,sep=", ") {
     L <- list(...)
     L <- lapply(L,function(x) {x[is.na(x)] <- ""; x})
     ret <-gsub(paste0("(^",sep,"|",sep,"$)"),"",
                 gsub(paste0(sep,sep),sep,
                      do.call(paste,c(L,list(sep=sep)))))
     is.na(ret) <- ret==""
     ret
     }
 val<- paste3(c("a","b", "c", NA), c("A","B", NA, NA))
 val
#[1] "a, A" "b, B" "c"    NA    
Threepiece answered 28/3, 2013 at 2:23 Comment(2)
Is this intended paste3(c("a", "b", "c", NA), c("A", "B", NA, NA), sep = "|") returns [1] "|a|||A|" "|b|||B|" "|c|||" "|||" ? In contrast paste(c("a", "b", "c", NA), c("A", "B", NA, NA), sep = "|") returns [1] "a|A" "b|B" "c|NA" "NA|NA"Selfseeker
It's not intended. If you need to use "|" as a separator, then realize that it was being used inside the function to act as a logical OR in a regex pattern. So there ought to be a trap to that particular separator and alternate processing.Threepiece
G
58

I know this question is many years old, but it's still the top google result for r paste na. I was looking for a quick solution to what I assumed was a simple problem, and was somewhat taken aback by the complexity of the answers. I opted for a different solution, and am posting it here in case anyone else is interested.

bar <- apply(cbind(1:4, foo), 1, 
        function(x) paste(x[!is.na(x)], collapse = ", "))
bar
[1] "1, A" "2, B" "3, C" "4"

In case it isn't obvious, this will work on any number of vectors with NAs in any positions.

IMHO, the advantage of this over the existing answers is legibility. It's a one-liner, which is always nice, and it doesn't rely on a bunch of regexes and if/else statements which may trip up your colleagues or future self. Erik Shitts' answer mostly shares these advantages, but assumes there are only two vectors and that only the last of them contains NAs.

My solution doesn't satisfy the requirement in your edit, because my project has the opposite requirement. However, you can easily solve this by adding a second line borrowed from 42-'s answer:

is.na(bar) <- bar == ""
Goles answered 9/3, 2018 at 20:6 Comment(1)
This worked for me. Very simple. I wish this was shipped with paste.Mebane
C
50

I found a dplyr/tidyverse solution to that question, which is rather elegant in my opinion.

library(tidyr)
foo <- LETTERS[1:4] 
foo[4] <- NA 
df <- data.frame(foo, num = 1:4)
df %>% unite(., col = "New.Col",  num, foo, na.rm=TRUE, sep = ",")
>    New.Col
  1:     1,A
  2:     2,B
  3:     3,C
  4:       4
Calendar answered 23/12, 2019 at 14:12 Comment(6)
Why loading data.table and all tidyverse packages if it is only about unite?Circumambulate
Because I just wanted to make sure it works and I am using both dplyr and data.table. I rather load more than necessary than have non-reproducible code.Calendar
Perfect. This a great one-line solution to replace the previous clunky (but required without dplyr) ones.Oodles
One again dplyr provides an elegant solutionAppendicular
FYI: unite is in tidyr package, so no need to load data.table & the entire tidyverse for this if you need to keep your dependencies tidy.Caddric
I've removed the unnecessary dependencies. :-)Calendar
T
48

For the purpose of a "true-NA": Seems the most direct route is just to modify the value returned by paste2 to be NA when the value is ""

 paste3 <- function(...,sep=", ") {
     L <- list(...)
     L <- lapply(L,function(x) {x[is.na(x)] <- ""; x})
     ret <-gsub(paste0("(^",sep,"|",sep,"$)"),"",
                 gsub(paste0(sep,sep),sep,
                      do.call(paste,c(L,list(sep=sep)))))
     is.na(ret) <- ret==""
     ret
     }
 val<- paste3(c("a","b", "c", NA), c("A","B", NA, NA))
 val
#[1] "a, A" "b, B" "c"    NA    
Threepiece answered 28/3, 2013 at 2:23 Comment(2)
Is this intended paste3(c("a", "b", "c", NA), c("A", "B", NA, NA), sep = "|") returns [1] "|a|||A|" "|b|||B|" "|c|||" "|||" ? In contrast paste(c("a", "b", "c", NA), c("A", "B", NA, NA), sep = "|") returns [1] "a|A" "b|B" "c|NA" "NA|NA"Selfseeker
It's not intended. If you need to use "|" as a separator, then realize that it was being used inside the function to act as a logical OR in a regex pattern. So there ought to be a trap to that particular separator and alternate processing.Threepiece
T
16

A function that follows up on @ErikShilt's answer and @agstudy's comment. It generalizes the situation slightly by allowing sep to be specified and handling cases where any element (first, last, or intermediate) is NA. (It might break if there are multiple NA values in a row, or in other tricky cases ...) By the way, note that this situation is described exactly in the second paragraph of the Details section of ?paste, which indicates that at least the R authors are aware of the situation (although no solution is offered).

paste2 <- function(...,sep=", ") {
    L <- list(...)
    L <- lapply(L,function(x) {x[is.na(x)] <- ""; x})
    gsub(paste0("(^",sep,"|",sep,"$)"),"",
                gsub(paste0(sep,sep),sep,
                     do.call(paste,c(L,list(sep=sep)))))
}
foo <- c(LETTERS[1:3],NA)
bar <- c(NA,2:4)
baz <- c("a",NA,"c","d")
paste2(foo,bar,baz)
# [1] "A, a"    "B, 2"    "C, 3, c" "4, d"   

This doesn't handle @agstudy's suggestions of (1) incorporating the optional collapse argument; (2) making NA-removal optional by adding an na.rm argument (and setting the default to FALSE to make paste2 backward compatible with paste). If one wanted to make this more sophisticated (i.e. remove multiple sequential NAs) or faster it might make sense to write it in C++ via Rcpp (I don't know much about C++'s string-handling, but it might not be too hard -- see convert Rcpp::CharacterVector to std::string and Concatenating strings doesn't work as expected for a start ...)

Theriot answered 2/12, 2012 at 22:52 Comment(2)
I think you change your do.call by do.call(paste,c(L,list(sep=sep,collapse=collapse))))) and you get your collapse argumentAppendectomy
yes, it's not hard -- I just didn't bother (yet). Feel free to edit if you like (oops, you can't -- requires 2000 rep -- sorry.)Theriot
B
13

As Ben Bolker mentioned the above approaches may fall over if there are multiple NAs in a row. I tried a different approach that seems to overcome this.

paste4 <- function(x, sep = ", ") {
  x <- gsub("^\\s+|\\s+$", "", x) 
  ret <- paste(x[!is.na(x) & !(x %in% "")], collapse = sep)
  is.na(ret) <- ret == ""
  return(ret)
  }

The second line strips out extra whitespace introduced when concatenating text and numbers. The above code can be used to concatenate multiple columns (or rows) of a dataframe using the apply command, or repackaged to first coerce the data into a dataframe if needed.

EDIT

After a few more hours thought I think the following code incorporates the suggestions above to allow specification of the collapse and na.rm options.

paste5 <- function(..., sep = " ", collapse = NULL, na.rm = F) {
  if (na.rm == F)
    paste(..., sep = sep, collapse = collapse)
  else
    if (na.rm == T) {
      paste.na <- function(x, sep) {
        x <- gsub("^\\s+|\\s+$", "", x)
        ret <- paste(na.omit(x), collapse = sep)
        is.na(ret) <- ret == ""
        return(ret)
      }
      df <- data.frame(..., stringsAsFactors = F)
      ret <- apply(df, 1, FUN = function(x) paste.na(x, sep))

      if (is.null(collapse))
        ret
      else {
        paste.na(ret, sep = collapse)
      }
    }
}

As above, na.omit(x) can be replaced with (x[!is.na(x) & !(x %in% "") to also drop empty strings if desired. Note, using collapse with na.rm = T returns a string without any "NA", though this could be changed by replacing the last line of code with paste(ret, collapse = collapse).

nth <- paste0(1:12, c("st", "nd", "rd", rep("th", 9)))
mnth <- month.abb
nth[4:5] <- NA
mnth[5:6] <- NA

paste5(mnth, nth)
[1] "Jan 1st"  "Feb 2nd"  "Mar 3rd"  "Apr NA"   "NA NA"    "NA 6th"   "Jul 7th"  "Aug 8th"  "Sep 9th"  "Oct 10th" "Nov 11th" "Dec 12th"

paste5(mnth, nth, sep = ": ", collapse = "; ", na.rm = T)
[1] "Jan: 1st; Feb: 2nd; Mar: 3rd; Apr; 6th; Jul: 7th; Aug: 8th; Sep: 9th; Oct: 10th; Nov: 11th; Dec: 12th"

paste3(c("a","b", "c", NA), c("A","B", NA, NA), c(1,2,NA,4), c(5,6,7,8))
[1] "a, A, 1, 5" "b, B, 2, 6" "c, , 7"     "4, 8" 

paste5(c("a","b", "c", NA), c("A","B", NA, NA), c(1,2,NA,4), c(5,6,7,8), sep = ", ", na.rm = T)
[1] "a, A, 1, 5" "b, B, 2, 6" "c, 7"       "4, 8" 
Bauske answered 20/7, 2015 at 4:16 Comment(0)
W
7

You can use ifelse, a vectorized if-else construct to determine if a value is NA and substitute a blank. You'll then use gsub to strip out the trailing ", " if it isn't followed by any other string.

gsub(", $", "", paste(1:4, ifelse(is.na(foo), "", foo), sep = ", "))

Your answer is correct. There isn't a better way to do it. This issue is explicitly mentioned in the paste documentation in the Details section.

Witten answered 2/12, 2012 at 21:21 Comment(6)
Thank you for responding to my question, but your code still leaves me with "4, ", what I'm looking for is "4".Fragrance
@EricFail, sorry I didn't notice the lack of ", " on the last element. Your answer is the correct one.Witten
That solves my question as is, thanks. So, there is no way to change the behaviour of past()?Fragrance
@EricFail paste is fine the way it is. You want to use it to do something non-standard so it makes sense that you should have to do a little more work to specify the behavior that you want. The way it currently works is fine IMO.Peoples
@Dason, I'm not saying paste is not fine, I am simply trying to solve an issue I thought other people would also have. In my "real" example I have a lot of variables that I am trying to combine into one vector. I guess there is no short cut to solve this problem. regardless, thankful for the responses!Fragrance
if the above code would work for x as x = gsub(",$", "", paste(1:4, ifelse(is.na(foo), "", foo), sep = ",")) then in order to get rid of the trailing "," in it just repeat it once again x = gsub(",$", "", x) It worked for me to get rid of that annoying "4," last commaHayner
B
6

If working with df or tibbles using tidyverse, I use mutate_all or mutate_at with str_replace_na before paste or unite to avoid pasting NAs.

library(tidyverse)
new_df <- df  %>%
mutate_all(~str_replace_na(., "")) %>%
mutate(combo_var = paste0(var1, var2, var3))

OR

new_df <- df  %>%
mutate_at(c('var1', 'var2'), ~str_replace_na(., "")) %>%
mutate(combo_var = paste0(var1, var2))
Bourg answered 14/1, 2020 at 0:20 Comment(1)
unite has the option to set na.rm = TRUE, so you could skip the mutate steps and call something like df %>% unite('col3', col1:col2, sep = ' ', na.rm = TRUE). It's worth noting that the default sep argument is an underscore. Also worth noting that na.rm = TRUE might not work as expected if one or more of the columns being combined is numeric.Tread
F
3

This can be acheived in a single line. For e.g.,

vec<-c("A","B",NA,"D","E")
res<-paste(vec[!is.na(vec)], collapse=',' )
print(res)
[1] "A,B,D,E"
Fib answered 3/8, 2021 at 20:7 Comment(0)
G
2

Or remove the NAs after paste with str_replace_all

data$1 <- str_replace_all(data$1, "NA", "")
Geraint answered 25/4, 2019 at 10:5 Comment(1)
There are other answers that provide the OP's question, and they were posted many years ago. When posting an answer, please make sure you add either a new solution, or a substantially better explanation, especially when answering older questions.Tandi
B
1

A variant of Joe's solution (https://mcmap.net/q/266501/-suppress-nas-in-paste) that respects both sep and collapse and returns NA when all values are NA is:

paste_missing <- function(..., sep=" ", collapse=NULL) {
  ret <-
    apply(
      X=cbind(...),
      MARGIN=1,
      FUN=function(x) {
        if (all(is.na(x))) {
          NA_character_
        } else {
          paste(x[!is.na(x)], collapse = sep)
        }
      }
    )
  if (!is.null(collapse)) {
    paste(ret, collapse=collapse)
  } else {
    ret
  }
}
Blankbook answered 13/9, 2019 at 15:8 Comment(0)
N
1

Here is a solution that behaves more like paste and handles more edge cases than current solutions (empty strings, "NA" strings, more than 2 arguments, use of collapse argument...).

paste2 <- function(..., sep = " ", collapse = NULL, na.rm = FALSE){
  # in default case, use paste 
  if(!na.rm) return(paste(..., sep = sep, collapse = collapse))
  # cbind is convenient to recycle, it warns though so use suppressWarnings
  dots <- suppressWarnings(cbind(...))
  res <- apply(dots, 1, function(...) {
    if(all(is.na(c(...)))) return(NA)
    do.call(paste, as.list(c(na.omit(c(...)), sep = sep)))
  })
  if(is.null(collapse)) res else
   paste(na.omit(res), collapse = collapse)
}

# behaves like `paste()` by default
paste2(c("a","b", "c", NA), c("A","B", NA, NA))
#> [1] "a A"   "b B"   "c NA"  "NA NA"

# trigger desired behavior by setting `na.rm = TRUE` and `sep = ", "`
paste2(c("a","b", "c", NA), c("A","B", NA, NA), sep = ",", na.rm = TRUE)
#> [1] "a,A" "b,B" "c"   NA

# handles hedge cases
paste2(c("a","b", "c", NA, "", "",   ""),
       c("a","b", "c", NA, "", "", "NA"),
       c("A","B",  NA, NA, NA, "",   ""), 
       sep = ",", na.rm = TRUE)
#> [1] "a,a,A" "b,b,B" "c,c"   NA      ","     ",,"    ",NA,"

Created on 2019-10-01 by the reprex package (v0.3.0)

Nims answered 1/10, 2019 at 14:28 Comment(0)
C
1

Small overview of the tidyverse solutions:

library(tidyverse)
dat <- tibble(x = c("a", "b", NA,  NA),
              y = c("A",  NA, NA, "D"))

### str_c()
### missing values are "infectious"
dat %>% 
  mutate(z = str_c(x, y))

### str_c() and str_replace_na()
### difficult sytax
dat %>% 
  mutate(across(c(x, y), ~ str_replace_na(.x, replacement = ""), .names = "{.col}r"),
         z = str_c(xr, yr))

### unite()
### unintuitive to use something different than mutate()...
dat %>% 
  unite(col = "z", x, y, sep = "", remove = FALSE, na.rm = TRUE)

### User defined function paste2()
paste2 <- function(x, sep = "") {paste(x[!is.na(x)], collapse = sep)}
dat %>% 
  rowwise() %>% 
  mutate(z = paste2(c(x, y)))

Add the following to the end of the pipe if the result should be NA when all elements are NA

mutate(z = if_else(z == "", NA, z))
Chrysalis answered 13/6, 2023 at 14:11 Comment(0)
H
0

Updating @Erik Shilts solution in order to get rid of the last one comma:

x = gsub(",$", "", paste(1:4, ifelse(is.na(foo), "", foo), sep = ","))

Then in order to get rid of the trailing last "," in it just repeat it once again:

x <- gsub(",$", "", x)
Hayner answered 2/1, 2021 at 21:20 Comment(0)
G
0

This works for me

library(stringr)

foo <- LETTERS[1:4]
foo[4] <- NA
foo
# [1] "A" "B" "C" NA 

if_else(!is.na(foo),
    str_c(1:4, str_replace_na(foo, ""), sep = ", "),
    str_c(1:4, str_replace_na(foo, ""), sep = "")
    )
# [1] "1, A" "2, B" "3, C" "4"
Gusgusba answered 30/6, 2021 at 19:28 Comment(0)
P
0

Using glue worked for me. Make sure you set .na = ""

head(iris) %>%
  mutate(description = glue::glue("This {Species} has a petal length of {Petal.Length}"), .na = "")
Polyvalent answered 27/4, 2024 at 0:47 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.