Oracle REGEXP_LIKE and word boundaries
Asked Answered
G

3

40

I am having a problem with matching word boundaries with REGEXP_LIKE. The following query returns a single row, as expected.

select 1 from dual
where regexp_like('DOES TEST WORK HERE','TEST');

But I want to match on word boundaries as well. So, adding the "\b" characters gives this query

select 1 from dual
where regexp_like('DOES TEST WORK HERE','\bTEST\b');

Running this returns zero rows. Any ideas?

Gathering answered 27/9, 2011 at 10:32 Comment(1)
That's weird. I can't get it to work, either... For example, select regexp_replace('DOES TEST WORK HERE','\bTEST\b','X') from dual; returns DOES TEST WORK HERE... It works if you use \W, but that's not the same as \b :PPhallicism
K
67

I believe you want to try

 select 1 from dual 
  where regexp_like ('does test work here', '(^|\s)test(\s|$)');

because the \b does not appear on this list: Perl-influenced Extensions in Oracle Regular Expressions

The \s makes sure that test starts and ends in a whitespace. This is not sufficient, however, since the string test could also appear at the very start or end of the string being matched. Therefore, I use the alternative (indicated by the |) ^ for start of string and $ for end of string.

Update (after 3 years+)... As it happens, I needed this functionality today, and it appears to me, that even better a regular expression is (^|\s|\W)test($|\s|\W) (The missing \b regular expression special character in Oracle).

Kadi answered 27/9, 2011 at 10:47 Comment(4)
Thanks for that. I found lots of resources around the net (e.g. psoug.org/snippet/… ) that suggested you can. I actually want to match the beginning or end of a string, or a "non-word" character in my case - so I switched \W in place of \s.Gathering
Yes, it seems that Oracle chose not to support \b although this is a rather standard regular expression token.Lysis
Oracle's regular expressions use the POSIX ERE standard (with some enhancements such as backreferences) which doesn't support word boundaries.Provocation
In your updated regexp the \s character class is redundant, as you are including \W (a character class which is a superset of \s).Enrika
G
8

The shortest regex that can check for a whole word in Oracle is

(^|\W)test($|\W)

See the regex demo.

Details

  • (^|\W) - a capturing group matching either
    • ^ - start of string
    • | - or
    • \W - a non-word char
  • test - a word
  • ($|\W) - a capturing group matching either
    • $ - end of string
    • | - or
    • \W - a non-word char.

Note that \W matches any chars but letters, digits and _. If you want to match a word that can appear in between _ (underscores), you need a bit different pattern:

(^|[^[:alnum:]])test($|[^[:alnum:]])

The [^[:alnum:]] negated bracket expression matches any char but alphanumeric chars, and matches _, so, _test_ will be matched with this pattern.

See this regex demo.

Gunpowder answered 21/9, 2018 at 10:10 Comment(0)
B
1

In general, I would stick with René's solution, the exception being when you need the match to be zero-length. ie You don't want to actually capture the non-word character at the beginning/end.

For example, if our string is test test then (\b)test(\b) will match twice but (^|\s|\W)test($|\s|\W) will only match the first occurrence. At least, that's certainly the case if you try to use regexp_substr.

Example

SELECT regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 1, 'i'), regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 2, 'i') FROM dual;

Returns

test |NULL

Bismuthic answered 11/2, 2015 at 14:10 Comment(1)
But the OP's point is that zero length word boundary \b doesn't work in Oracle. As explained in comments on other answers. And since there's no lookahead support, we can't find overlapping matches either... So we could never find two "test" matches in the string "some string testtest some string".Mohair

© 2022 - 2024 — McMap. All rights reserved.