Extract only whole word using grep
Asked Answered
M

3

17

I've got a big text file. I need to extract all the lines which contains the exact word "DUSP1". Here an example of the lines:

9606    ENSP00000239223 DUSP1   BLAST
9606    ENSP00000239223 DUSP1-001 Ensembl

I want to retrieve the first line but not the second one.

I tried several commands as:

grep -E "^DUSP1"
grep '\<DUSP1\>'
grep '^DUSP1$'
grep -w DUSP1

But none of them seem to work. Which option should I use?

Metatarsus answered 12/7, 2013 at 13:27 Comment(2)
How exactly is the "exact word" defined? And your 3rd example would only find lines with only the word "DUSP1" ... So you want lines with "^DUSP1[[:space:]]+" ?Claribelclarice
Could you provide sample file content. The 2nd, 3rd, 4th commands works for me.Drogheda
B
19

The problem you are facing is that a dash (-) is considered by grep as a word delimiter.

You should try this command :

grep '\sDUSP1\s' file

to ensure that there's spaces around your word.
Or use words boundaries :

grep '\bDUSP1\b' file
Backplate answered 12/7, 2013 at 13:29 Comment(0)
K
32

If you want to grep exactly the whole word, you can use word boundaries like this:

grep '\bDUSP1\b'

This matches for the exact word at the beginning and at the end.

Knight answered 5/5, 2015 at 10:33 Comment(6)
This should be the accepted answer, there are not always spaces before and after (what about when it's the last word?).Lingonberry
this is great. It also matches setting=DUSP1 and my/folder/to/DUSP1, but not DUSP123Sacramentalism
I had to use double quotes for the Windows version of GNU grep. Single quotes did not work.Hoch
Yep, this should be the answer with the big green checkmark. :DCardsharp
this still greps DUSP1-001 for meCheerless
no does not work - still greps words with DUSP1-xxxDumfries
B
19

The problem you are facing is that a dash (-) is considered by grep as a word delimiter.

You should try this command :

grep '\sDUSP1\s' file

to ensure that there's spaces around your word.
Or use words boundaries :

grep '\bDUSP1\b' file
Backplate answered 12/7, 2013 at 13:29 Comment(0)
N
2

adding to what sputpick said, it could either be that or:

grep '\sDUSP1$' file 

if the DUSP1 is the end of the line.

Nettlesome answered 3/11, 2014 at 17:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.