R: How do you apply grep() in lapply()
Asked Answered
K

2

5

I would like to apply grep() in R, but I am not really good in lapply(). I understand that lapply is able to take a list, apply function to each members and output a list. For instance, let x be a list consists of 2 members.

> x<-strsplit(docs$Text," ")
> 
> x
[[1]]
 [1] "I"         "lovehttp"  "my"        "mum."      "I"         "love"     
 [7] "my"        "dad."      "I"         "love"      "my"        "brothers."

[[2]]
 [1] "I"         "live"      "in"        "Eastcoast" "now."      "Job.I"    
 [7] "used"      "to"        "live"      "in"        "WestCoast."  

I would like to apply grep() function to remove words consisting of http. So, I would apply:

> lapply(x,grep(pattern="http",invert=TRUE, value=TRUE))

But it does not work and it says

Error in grep(pattern = "http", invert = TRUE, value = TRUE) : 
argument "x" is missing, with no default

So, I tried

> lapply(x,grep(pattern="http",invert=TRUE, value=TRUE,x))

But it says

Error in match.fun(FUN) : 
'grep(pattern = "http", invert = TRUE, value = TRUE, x)' is not a 
function, character or symbol

A help please, and thanks!

Kiowa answered 14/3, 2016 at 5:36 Comment(3)
you need to pass the dataset where grep want to work with.Gribble
@TimBiegeleisen Initially, I wanted to remove the entire word consisting of http. So since "lovehttp" consists of "http", it would be removed. In the case where I want to remove just "http" and retain "love", is it possible?Kiowa
You need to figure out what answer you want. You are changing the requirements now.Flannelette
C
5

This can be done in one line:

lst <- lapply(lst, grep, pattern="http", value=TRUE, invert=TRUE)

#lst
#[[1]]
# [1] "I"         "my"        "mum."      "I"         "love"      "my"        "dad."      "I"         "love"      "my"        "brothers."
#
#[[2]]
# [1] "I"          "live"       "in"         "Eastcoast"  "now."       "Job.I"      "used"       "to"         "live"       "in"         "WestCoast."

If you don't want to remove the entire word that contains the pattern and remove only the pattern itself while retaining the rest of the word (as discussed in the comments), you can use gsub instead of grep:

lapply(lst, gsub, pattern="http", replacement="")
#[[1]]
# [1] "I"         "love"      "my"        "mum."      "I"         "love"      "my"        "dad."      "I"         "love"      "my"        "brothers."
#
#[[2]]
# [1] "I"          "live"       "in"         "Eastcoast"  "now."       "Job.I"      "used"       "to"         "live"       "in"         "WestCoast."
Corybantic answered 14/3, 2016 at 6:26 Comment(0)
F
4

The following line of code will remove all entries from vectors in your list which contain the substring http:

repx <- function(x) {
    y <- grep("http", x)
    vec <- rep(TRUE, length(x))
    vec[y] <- FALSE
    x <- x[vec]
    return(x)
}

lapply(lst, function(x) { repx(x) })

Data:

x1 <- c("I", "lovehttp", "my", "mum.", "I", "love", "my", "dad.", "I", "love", "my", "brothers.")
x2 <- c("I", "live", "in", "Eastcoast", "now.", "Job.I", "used", "to", "live", "in", "WestCoast.")
lst <- list(x1, x2)
Flannelette answered 14/3, 2016 at 5:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.