Difference between `paste`, `str_c`, `str_join`, `stri_join`, `stri_c`, `stri_paste`?
Asked Answered
J

1

6

What are the differences between all of these functions that seem very similar ?

Jacquard answered 2/11, 2018 at 12:3 Comment(0)
J
11
  • stri_join, stri_c, and stri_paste come from package stringi and are pure aliases

  • str_c comes from stringr and is just stringi::stri_join with a parameter ignore_null hardcoded to TRUE while stringi::stri_join has it set to FALSE by default. stringr::str_join is a deprecated alias for str_c

see:

library(stringi)
identical(stri_join, stri_c)
# [1] TRUE
identical(stri_join, stri_paste)
# [1] TRUE

library(stringr)
str_c
# function (..., sep = "", collapse = NULL) 
# {
#   stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE)
# }
# <environment: namespace:stringr>

stri_join is very similar to base::paste with a few differences enumerated below:


1. sep = "" by default

So it behaves more like paste0 by default, but paste0 lost its sep argument.

identical(paste0("a","b")        , stri_join("a","b"))
# [1] TRUE
identical(paste("a","b")         , stri_join("a","b",sep=" "))
# [1] TRUE
identical(paste("a","b", sep="-"), stri_join("a","b", sep="-"))
# [1] TRUE

str_c will behave just like stri_join here.


2. Behavior with NA

if you paste to NA using stri_join, the result is NA, while paste converts NA to "NA"

paste0(c("a","b"),c("c",NA))
# [1] "ac"  "bNA"
stri_join(c("a","b"),c("c",NA))
# [1] "ac" NA

str_c will behave just like stri_join here as well


3. Behavior with length 0 arguments

When a length 0 value is encountered, character(0) is returned, except if ignore_null is set to FALSE, then the value is ignored. It is different from the behavior of paste which would convert the length 0 value to "" and thus contain 2 consecutive separators in the output.

stri_join("a",NULL, "b")  
# [1] character(0)
stri_join("a",character(0), "b")  
# [1] character(0)

paste0("a",NULL, "b")
# [1] "ab"
stri_join("a",NULL, "b", ignore_null = TRUE)
# [1] "ab"
str_c("a",NULL, "b")
# [1] "ab"

paste("a",NULL, "b") # produces double space!
# [1] "a  b" 
stri_join("a",NULL, "b", ignore_null = TRUE, sep = " ")
# [1] "a b"
str_c("a",NULL, "b", sep = " ")
# [1] "a b"

4. stri_join warns more

paste(c("a","b"),c("c","d","e"))
# [1] "a c" "b d" "a e"
paste("a","b", sep = c(" ","-"))
# [1] "a b"

stri_join(c("a","b"),c("c","d","e"), sep = " ")
# [1] "a c" "b d" "a e"
# Warning message:
#   In stri_join(c("a", "b"), c("c", "d", "e"), sep = " ") :
#   longer object length is not a multiple of shorter object length
stri_join("a","b", sep = c(" ","-"))
# [1] "a b"
# Warning message:
#   In stri_join("a", "b", sep = c(" ", "-")) :
#   argument `sep` should be one character string; taking the first one

5. stri_join is faster

microbenchmark::microbenchmark(
  stringi = stri_join(rep("a",1000000),rep("b",1000),"c",sep=" "),
  base    = paste(rep("a",1000000),rep("b",1000),"c")
)

# Unit: milliseconds
#    expr       min       lq      mean    median       uq      max neval cld
# stringi  88.54199  93.4477  97.31161  95.17157  96.8879 131.9737   100  a 
# base    166.01024 169.7189 178.31065 171.30910 176.3055 215.5982   100   b
Jacquard answered 2/11, 2018 at 12:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.