Regular expression to match a word or its prefix
Asked Answered
N

5

173

I want to match a regular expression on a whole word.

In the following example I am trying to match s or season but what I have matches s, e, a, o and n.

[s|season]

How do I make a regular expression to match a whole word?

Neoterize answered 23/8, 2013 at 12:5 Comment(1)
Use (season|s) instead. [season] matches any of s,e,a,o,n.Postfix
G
205

Square brackets are meant for character class, and you're actually trying to match any one of: s, |, s (again), e, a, s (again), o and n.

Use parentheses instead for grouping:

(s|season)

or non-capturing group:

(?:s|season)

Note: Non-capture groups tell the engine that it doesn't need to store the match, while the other one (capturing group does). For small stuff, either works, for 'heavy duty' stuff, you might want to see first if you need the match or not. If you don't, better use the non-capture group to allocate more memory for calculation instead of storing something you will never need to use.

Gaillardia answered 23/8, 2013 at 12:7 Comment(3)
Yeah I had realised that. The non-capturing was what I needed. I thought using () would always match, knowing there is an option not to match is handy, Thank you.Neoterize
You missunderstood that. The ?: inside a grouping aka non-capturing just says, that you can't use the matched expressions with $1, $2 and so on... If you want that an expression is not matched, what you need is ^.Muggins
@NMGodA1b2c3d4 You're welcome! Do you mean an option not to match or catch (there's a difference, yes). If you don't want to match any of these, you'll use (?! ... ) insead, meaning (?!s|season) in this case.Gaillardia
P
167

Use this live online example to test your pattern:

enter image description here

Above screenshot taken from this live example: https://regex101.com/r/cU5lC2/1

Matching any whole word on the commandline.

I'll be using the phpsh interactive shell on Ubuntu 12.10 to demonstrate the PCRE regex engine through the method known as preg_match

Start phpsh, put some content into a variable, match on word.

el@apollo:~/foo$ phpsh

php> $content1 = 'badger'
php> $content2 = '1234'
php> $content3 = '$%^&'

php> echo preg_match('(\w+)', $content1);
1

php> echo preg_match('(\w+)', $content2);
1

php> echo preg_match('(\w+)', $content3);
0

The preg_match method used the PCRE engine within the PHP language to analyze variables: $content1, $content2 and $content3 with the (\w)+ pattern.

$content1 and $content2 contain at least one word, $content3 does not.

Match a specific words on the commandline without word bountaries

el@apollo:~/foo$ phpsh

php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'darty gun';
php> $gun4 = 'unicorn gun';

php> echo preg_match('(dart|fart)', $gun1);
1

php> echo preg_match('(dart|fart)', $gun2);
1

php> echo preg_match('(dart|fart)', $gun3);
1

php> echo preg_match('(dart|fart)', $gun4);
0

Variables gun1 and gun2 contain the string dart or fart which is correct, but gun3 contains darty and still matches, that is the problem. So onto the next example.

Match specific words on the commandline with word boundaries:

Word Boundaries can be force matched with \b, see: Visual analysis of what wordboundary is doing from jex.im/regulex

Regex Visual Image acquired from http://jex.im/regulex and https://github.com/JexCheng/regulex Example:

el@apollo:~/foo$ phpsh

php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'darty gun';
php> $gun4 = 'unicorn gun';

php> echo preg_match('(\bdart\b|\bfart\b)', $gun1);
1

php> echo preg_match('(\bdart\b|\bfart\b)', $gun2);
1

php> echo preg_match('(\bdart\b|\bfart\b)', $gun3);
0

php> echo preg_match('(\bdart\b|\bfart\b)', $gun4);
0

The \b asserts that we have a word boundary, making sure " dart " is matched, but " darty " isn't.

Pye answered 6/1, 2014 at 17:31 Comment(4)
upvoted because I needed the \b char, and didn't know it!Caretaker
Why isn't this thorough explanation the answer?Quillon
Because the person who posted the question selected the first answer that came in, and didn't bother to switch to mine when my vastly superior answer came in later. You can ask the questioner via comment under the question to change their answer selection to this one, and it would improve the value of this page to people who land on it.Pye
I upvoted because you used the word "fart" in your example...and I needed the \w+ ;)Clayborne
C
3

I test examples in js. Simplest solution - just add word u need inside / /:

var reg = /cat/;
reg.test('some cat here');//1 test
true // result
reg.test('acatb');//2 test
true // result

Now if u need this specific word with boundaries, not inside any other signs-letters. We use b marker:

var reg = /\bcat\b/
reg.test('acatb');//1 test 
false // result
reg.test('have cat here');//2 test
true // result

We have also exec() method in js, whichone returns object-result. It helps f.g. to get info about place/index of our word.

var matchResult = /\bcat\b/.exec("good cat good");
console.log(matchResult.index); // 5

If we need get all matched words in string/sentence/text, we can use g modifier (global match):

"cat good cat good cat".match(/\bcat\b/g).length
// 3 

Now the last one - i need not 1 specific word, but some of them. We use | sign, it means choice/or.

"bad dog bad".match(/\bcat|dog\b/g).length
// 1
Costotomy answered 12/11, 2016 at 13:33 Comment(0)
M
2

[ ] defines a character class. So every character you set there, will match. [012] will match 0 or 1 or 2 and [0-2] behaves the same.

What you want is groupings to define a or-statement. Use (s|season) for your issue.

Btw. you have to watch out. Metacharacters in normal regex (or inside a grouping) are different from character class. A character class is like a sub-language. [$A] will only match $ or A, nothing else. No escaping here for the dollar.

Muggins answered 23/8, 2013 at 12:9 Comment(0)
C
0
  • if you are considering 's' as a word, we can approach like this,

    \bs\b|\bseason\b

  • if you are considering 's' which appear in word, the we can approach,

    s|\bseason\b

Citrange answered 1/9, 2022 at 10:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.