How to substring every element in vector of strings?
Asked Answered
T

4

9

I have the vector:

v <- c("godzilla", "jurassic", "googly")

I want the first 3 letters of every element in this vector. I would like to end up with:

# "god"   "jur"   "goo"

I have already tried using apply, but it didn't work. What should I do?

Topaz answered 18/6, 2014 at 17:49 Comment(1)
See, @agstudy you got answer-bombed by Gavin :-)Melbamelborn
B
13

One option is substring():

> substring(v, first = 1, last = 3)
[1] "god" "jur" "goo"

or also the R version, substr()

> substr(v, start = 1, stop = 3)
[1] "god" "jur" "goo"

Note the different names for the initial and last character you want.

As both of these functions are vectorised, there is no need for apply() and friends here.

Brachiate answered 18/6, 2014 at 17:53 Comment(1)
How would one apply this to vectors within the rows of a data.frame? When I apply to a column by df2 <- df1 %>% mutate(y = substr(x, 0, 4)), I end up with the beginning of the concatenated list and the first characters of the first element of the list, e.g. "c("12" "c("24" "c("34".Northampton
A
10

For the fun you can use a regular expression here :

sub('(^.{3}).*','\\1',v)
[1] "god" "jur" "goo"

Which is a another vectorized solution.

Album answered 18/6, 2014 at 17:59 Comment(2)
Powerful, yet simple! +1Gca
Though this gets ugly if the length (3) is variable: sub(paste0('(^.{', len, '}).*'),'\\1',v) vs. substr(v, 1, len)...Rattray
G
3

@Gavin Simpson's answer is the right way to go, but if you want to use apply() and friends here, you can try the following:

> sapply(strsplit(v, ""), function(x) paste0(x[1:3], collapse=""))
[1] "god" "jur" "goo"
Gca answered 18/6, 2014 at 17:58 Comment(3)
gah. want to award points for curiosity but wouldn't want anyone to actually use this answer!Janson
Yeah, this is just for fun.Gca
This is the best answer to me, it helps you understand how functional programming works on R.Homocercal
S
2

A stringr option is str_sub:

str_sub(v, 1, 3)
#[1] "god" "jur" "goo"

And str_sub_all for multiple substrings in each string:

str_sub_all(v, c(1, 2), c(3, 4))
# [[1]]
# [1] "god" "odz"
# 
# [[2]]
# [1] "jur" "ura"
# 
# [[3]]
# [1] "goo" "oog"
Shoeshine answered 5/1, 2023 at 13:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.