Delete line containing one of multiple strings
Asked Answered
F

4

10

I have a text file and I want to remove all lines containing the words: facebook, youtube, google, amazon, dropbox, etc.

I know to delete lines containing a string with sed:

sed '/facebook/d' myfile.txt

I don't want to run this command five different times though for each string, is there a way to combine all the strings into one command?

Feeder answered 11/6, 2013 at 17:14 Comment(1)
Urgh so many high up on google questions like this where all the answers assume a short list. His list finishes with etc, are any of the answer's below reasonable if that blacklist is 100,000 words long?Medrano
H
16

Try this:

sed '/facebook\|youtube\|google\|amazon\|dropbox/d' myfile.txt

From GNU's sed manual:

regexp1\|regexp2

Matches either regexp1 or regexp2. Use parentheses to use complex alternative regular expressions. The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used. It is a GNU extension.

Hypoglossal answered 11/6, 2013 at 17:16 Comment(0)
C
9
grep -vf wordsToExcludeFile myfile.txt

"wordsToExcludeFile" should contain the words you don't want, one per line.

If you need to save the result back to the same file, then add this to the command:

 > myfile.new && mv myfile.new myfile.txt
Colt answered 11/6, 2013 at 17:37 Comment(1)
dot is a regular expression "wildcard" character: dot matches any one character. The regex "facebook.com" will match the string "facebook-com", but it will also match the string "facebook.com". That's a long-winded of saying, yes it will work. Did you see unexpected results?Colt
J
6

With awk

awk '!/facebook|youtube|google|amazon|dropbox/' myfile.txt > filtered.txt
Jemmy answered 11/6, 2013 at 17:39 Comment(1)
And with gawk I could exact match whole words only. "gawk '!/\<facebook\>|\<youtube\>|\<google\>|\<amazon\>|\<dropbox\>/' myfile.txt > filtered.txt"Slunk
C
0

Multiple string removal using grep

grep -vE "facebook|youtube|google|amazon|dropbox" file.txt > filtered.txt

For case-insensitive string removal using grep

grep -viE "facebook|youtube|google|amazon|dropbox" file.txt > filtered_case_insensitive.txt
Calvin answered 13/7, 2024 at 19:6 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.