Regex word boundary alternative
Asked Answered
I

2

7

I was using the standard \b word boundary. However, it doesn't quite deal with the dot (.) character the way I want it to.

So the following regex:

\b(\w+)\b

will match cats and dogs in cats.dog if I have a string that says cats and dogs don't make cats.dogs.

I need a word boundary alternative that will match a whole word only if:

  1. it does not contain the dot(.) character
  2. it is encapsulated by at least one space( ) character on each side

Any ideas?!

P.S. I need this for PHP

Irretrievable answered 28/12, 2012 at 18:53 Comment(0)
C
6

You could try using (?<=\s) before and (?=\s) after in place of the \b to ensure that there is a space before and after it, however you might want to also allow for the possibility of being at the start or end of the string with (?<=\s|^) and (?=\s|$)

This will automatically exclude "words" with a . in them, but it would also exclude a word at the end of a sentence since there is no space between it and the full stop.

Cathepsin answered 28/12, 2012 at 18:57 Comment(1)
Thanks. Any way I could include words at the beginning and end of a sentence as well?! I might not need it, but it might just be good to know.Irretrievable
U
2

What you are trying to match can be done easily with array and string functions.

$parts = explode(' ', $str);
$res = array_filter($parts, function($e){
   return $e!=="" && strpos($e,".")===false;
});

I recommend this method as it saves time. Otherwise wasting few hours to find a good regex solution is quite unproductive.

Ungual answered 28/12, 2012 at 19:9 Comment(2)
I need this as part of another regex as a first step of a preg_replace function so it won't quite work for what I need to doIrretrievable
Then its better to ask what you need actually. There might be better solution than regex.Ungual

© 2022 - 2024 — McMap. All rights reserved.