Replace "whole word" when "word" starts with colon (\b is not working as intended)

Asked 13/3, 2012 at 10:53 Answered 20/5, 2024 at 22:29

Solved php regex string replace word-boundary

Since str_replace() matches :Name two times in :Name :Name_en, I want to match the results for the whole word only. I wanted to switch to preg_replace() because of this answer.

$str = ":Name :Name_en";
echo $str . chr(10);
$str = preg_replace('/\b' . ':Name' . '\b/i', '"Test"', $str);
echo $str;

But this doesn't work because of the colon. No replacement takes place. How should the regex be adjusted?

\b is the word boundary. But I think a colon doesn't belong to such a word boundary.

Nutt answered 13/3, 2012 at 10:53 Comment(4)

You first need to tell us what your definition of "word" is. – Francklyn 13/3, 2012 at 10:55

For my the whole word is :Name, :Name_en and so on. For RegExp I don't know. – Nutt 13/3, 2012 at 11:12

That's not a definition, it's an example. – Francklyn 13/3, 2012 at 11:16

Definition: It begins with a colon, followed by a string consisting of letters [a-zA-Z], underscore and numbers. It can be terminated by a space or a comma. – Nutt 13/3, 2012 at 11:20

You don't need the word boundary on the start of your string:

$str = preg_replace('/:Name\b/i', '"Test"', $str);

Anguish answered 13/3, 2012 at 10:56 Comment(3)

Sometimes it's so easy. Do you have a good link where RegExp like /, \b, /i are easily explained? – Nutt 13/3, 2012 at 11:17

Not off-hand :) I thought the php.net documentation was pretty good – Anguish 13/3, 2012 at 11:26

For those looking for a good link - cheatography.com/davechild/cheat-sheets/regular-expressions – Pastoralist 28/10, 2014 at 11:35

If you want to replace multiple keywords that are an associative, something like a dictionary or placeholders, you can use this to match your regex pattern:

$words = array(
    "_saudation_" => "Hello",
    "_animal_" => "cat",
    "_animal_sound_" => "MEooow"
);
$source = " _saudation_! My Animal is a _animal_ and it says _animal_sound_ ,  _no_match_";
        
echo preg_replace_callback(
    "/\b_(\w*)_\b/",
    function ($match) use ($words) {
        if (isset($words[$match[0]])) {
            return $words[$match[0]];
        } else {
            return $match[0];
        }
    },
    $source
);

Returns: Hello! My Animal is a cat and it says MEooow , _no_match_

Notice, that although "no_match" lacks translation, it will match during regex, but preserve its original value.

Hafiz answered 3/5, 2014 at 0:12 Comment(1)

This allows multiple replacements with different keys – Hafiz 19/9, 2015 at 21:58

function removeCommonWords($input) {

    // EEEEEEK Stop words
    $commonWords = array('a', 'able', 'about', 'above', 'abroad', 'according', 'accordingly', 'across', 'actually', 'adj', 'after', 'afterwards', 'again', 'against', 'ago', 'ahead', 'ain\'t', 'all', 'allow', 'allows', 'almost', 'alone', 'along', 'alongside', 'already', 'also', 'although', 'always', 'am', 'amid', 'amidst', 'among', 'amongst', 'an', 'and', 'another', 'any', 'anybody', 'anyhow', 'anyone', 'anything', 'anyway', 'anyways', 'anywhere', 'apart', 'appear', 'appreciate', 'appropriate', 'are', 'aren\'t', 'around', 'as', 'a\'s', 'aside', 'ask', 'asking', 'associated', 'at', 'available', 'away', 'awfully', 'b', 'back', 'backward', 'backwards', 'be', 'became', 'because', 'become', 'becomes', 'becoming', 'been', 'before', 'beforehand', 'begin', 'behind', 'being', 'believe', 'below', 'beside', 'besides', 'best', 'better', 'between', 'beyond', 'both', 'brief', 'but', 'by', 'c', 'came', 'can', 'cannot', 'cant', 'can\'t', 'caption', 'cause', 'causes', 'certain', 'certainly', 'changes', 'clearly', 'c\'mon', 'co', 'co.', 'com', 'come', 'comes', 'concerning', 'consequently', 'consider', 'considering', 'contain', 'containing', 'contains', 'corresponding', 'could', 'couldn\'t', 'course', 'c\'s', 'currently', 'd', 'dare', 'daren\'t', 'definitely', 'described', 'despite', 'did', 'didn\'t', 'different', 'directly', 'do', 'does', 'doesn\'t', 'doing', 'done', 'don\'t', 'down', 'downwards', 'during', 'e', 'each', 'edu', 'eg', 'eight', 'eighty', 'either', 'else', 'elsewhere', 'end', 'ending', 'enough', 'entirely', 'especially', 'et', 'etc', 'even', 'ever', 'evermore', 'every', 'everybody', 'everyone', 'everything', 'everywhere', 'ex', 'exactly', 'example', 'except', 'f', 'fairly', 'far', 'farther', 'few', 'fewer', 'fifth', 'first', 'five', 'followed', 'following', 'follows', 'for', 'forever', 'former', 'formerly', 'forth', 'forward', 'found', 'four', 'from', 'further', 'furthermore', 'g', 'get', 'gets', 'getting', 'given', 'gives', 'go', 'goes', 'going', 'gone', 'got', 'gotten', 'greetings', 'h', 'had', 'hadn\'t', 'half', 'happens', 'hardly', 'has', 'hasn\'t', 'have', 'haven\'t', 'having', 'he', 'he\'d', 'he\'ll', 'hello', 'help', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'here\'s', 'hereupon', 'hers', 'herself', 'he\'s', 'hi', 'him', 'himself', 'his', 'hither', 'hopefully', 'how', 'howbeit', 'however', 'hundred', 'i', 'i\'d', 'ie', 'if', 'ignored', 'i\'ll', 'i\'m', 'immediate', 'in', 'inasmuch', 'inc', 'inc.', 'indeed', 'indicate', 'indicated', 'indicates', 'inner', 'inside', 'insofar', 'instead', 'into', 'inward', 'is', 'isn\'t', 'it', 'it\'d', 'it\'ll', 'its', 'it\'s', 'itself', 'i\'ve', 'j', 'just', 'k', 'keep', 'keeps', 'kept', 'know', 'known', 'knows', 'l', 'last', 'lately', 'later', 'latter', 'latterly', 'least', 'less', 'lest', 'let', 'let\'s', 'like', 'liked', 'likely', 'likewise', 'little', 'look', 'looking', 'looks', 'low', 'lower', 'ltd', 'm', 'made', 'mainly', 'make', 'makes', 'many', 'may', 'maybe', 'mayn\'t', 'me', 'mean', 'meantime', 'meanwhile', 'merely', 'might', 'mightn\'t', 'mine', 'minus', 'miss', 'more', 'moreover', 'most', 'mostly', 'mr', 'mrs', 'much', 'must', 'mustn\'t', 'my', 'myself', 'n', 'name', 'namely', 'nd', 'near', 'nearly', 'necessary', 'need', 'needn\'t', 'needs', 'neither', 'never', 'neverf', 'neverless', 'nevertheless', 'new', 'next', 'nine', 'ninety', 'no', 'nobody', 'non', 'none', 'nonetheless', 'noone', 'no-one', 'nor', 'normally', 'not', 'nothing', 'notwithstanding', 'novel', 'now', 'nowhere', 'o', 'obviously', 'of', 'off', 'often', 'oh', 'ok', 'okay', 'old', 'on', 'once', 'one', 'ones', 'one\'s', 'only', 'onto', 'opposite', 'or', 'other', 'others', 'otherwise', 'ought', 'oughtn\'t', 'our', 'ours', 'ourselves', 'out', 'outside', 'over', 'overall', 'own', 'p', 'particular', 'particularly', 'past', 'per', 'perhaps', 'placed', 'please', 'plus', 'possible', 'presumably', 'probably', 'provided', 'provides', 'q', 'que', 'quite', 'qv', 'r', 'rather', 'rd', 're', 'really', 'reasonably', 'recent', 'recently', 'regarding', 'regardless', 'regards', 'relatively', 'respectively', 'right', 'round', 's', 'said', 'same', 'saw', 'say', 'saying', 'says', 'second', 'secondly', 'see', 'seeing', 'seem', 'seemed', 'seeming', 'seems', 'seen', 'self', 'selves', 'sensible', 'sent', 'serious', 'seriously', 'seven', 'several', 'shall', 'shan\'t', 'she', 'she\'d', 'she\'ll', 'she\'s', 'should', 'shouldn\'t', 'since', 'six', 'so', 'some', 'somebody', 'someday', 'somehow', 'someone', 'something', 'sometime', 'sometimes', 'somewhat', 'somewhere', 'soon', 'sorry', 'specified', 'specify', 'specifying', 'still', 'sub', 'such', 'sup', 'sure', 't', 'take', 'taken', 'taking', 'tell', 'tends', 'th', 'than', 'thank', 'thanks', 'thanx', 'that', 'that\'ll', 'thats', 'that\'s', 'that\'ve', 'the', 'their', 'theirs', 'them', 'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'there\'d', 'therefore', 'therein', 'there\'ll', 'there\'re', 'theres', 'there\'s', 'thereupon', 'there\'ve', 'these', 'they', 'they\'d', 'they\'ll', 'they\'re', 'they\'ve', 'thing', 'things', 'think', 'third', 'thirty', 'this', 'thorough', 'thoroughly', 'those', 'though', 'three', 'through', 'throughout', 'thru', 'thus', 'till', 'to', 'together', 'too', 'took', 'toward', 'towards', 'tried', 'tries', 'truly', 'try', 'trying', 't\'s', 'twice', 'two', 'u', 'un', 'under', 'underneath', 'undoing', 'unfortunately', 'unless', 'unlike', 'unlikely', 'until', 'unto', 'up', 'upon', 'upwards', 'us', 'use', 'used', 'useful', 'uses', 'using', 'usually', 'v', 'value', 'various', 'versus', 'very', 'via', 'viz', 'vs', 'w', 'want', 'wants', 'was', 'wasn\'t', 'way', 'we', 'we\'d', 'welcome', 'well', 'we\'ll', 'went', 'were', 'we\'re', 'weren\'t', 'we\'ve', 'what', 'whatever', 'what\'ll', 'what\'s', 'what\'ve', 'when', 'whence', 'whenever', 'where', 'whereafter', 'whereas', 'whereby', 'wherein', 'where\'s', 'whereupon', 'wherever', 'whether', 'which', 'whichever', 'while', 'whilst', 'whither', 'who', 'who\'d', 'whoever', 'whole', 'who\'ll', 'whom', 'whomever', 'who\'s', 'whose', 'why', 'will', 'willing', 'wish', 'with', 'within', 'without', 'wonder', 'won\'t', 'would', 'wouldn\'t', 'x', 'y', 'yes', 'yet', 'you', 'you\'d', 'you\'ll', 'your', 'you\'re', 'yours', 'yourself', 'yourselves', 'you\'ve', 'z', 'zero');

    return preg_replace('/\b(' . implode('|', $commonWords) . ')\b/i', '', $input);
}

$s = "It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.";

echo $s . "<br>";
echo removeCommonWords($s);

Schug answered 14/7, 2020 at 22:3 Comment(0)

If your using PHP 5+ you can still use str_replace().

$str = ":Name :Name_en";
echo $str . chr(10);

// The final int limits the function to a single replace.
$str = str_replace(':Name', '"Test"', $str, 1);

echo $str;

Output: (Demo)

:Name :Name_en

Fatal error: Uncaught Error: str_replace(): Argument #4 ($count) could not be passed by reference

Lema answered 13/3, 2012 at 11:3 Comment(7)

But which item does it replace? The first one, the last one, ...? In my application it could be that this changes ... – Nutt 13/3, 2012 at 11:7

The first one. str_replace is sequential. – Lema 13/3, 2012 at 11:35

That doesn't seem to be what the documentation says: count: If passed, this will be set to the number of replacements performed.. So it is a variable reference, not a constant – Bestead 7/10, 2013 at 13:13

@Bestead In the notes "Because str_replace() replaces left to right, it might replace a previously inserted value when doing multiple replacements". I assume relates to search/replace with arrays, so you could be right. I'll have to run some tests when I have a chance and find out if it applies to the 'subject' also. – Lema 9/10, 2013 at 10:28

FWIW, the problem with str_replace is that simple replaces such as replacing 'name' would match in fullname or firstname. this is where regex is a better option since you can use word boundaries. – Greenquist 28/8, 2017 at 3:47

@Greenquist You are correct, but that is why you would use unique tags that wouldn't be used in a normal sentence. ":Name" isn't great but is it basic. Maybe "{{#NAME#}}". I guess it depends on the complexity of the system. – Lema 2/12, 2017 at 11:9

@Greenquist I'll also add I have nothing wrong with regex. Just giving an alternative, as people get regex obsessed. But if it isn't required and your not comfortable with the syntax, better off just using a replace. – Lema 2/12, 2017 at 11:11

you can use space, to replace its common to find space between sentence words.

$str = ":Name :Name_en";
echo $str;

// The final int limits the function to a single replace.
$str = str_replace(':Name ', 'Test', $str);

echo $str;

Forbidding answered 12/9, 2018 at 7:22 Comment(0)

Rather than matching all whole words and searching the dictionary for a potential match, use the dictionary to generate a pipe-delimited subpattern for the regex operation. This will eliminate fruitless callback executions.

Because the regex pattern is delimited with #, it is not necessary to add the second parameter to the preg_quote() call. The same convenience is not afforded when the pattern is delimited with / because # is escaped by default by preg_quote() but / is not.

/b is a word boundary marker. These are places on either side of the subpattern. As long as the search terms begin and end with an alphanumeric character or an underscore, the \b will enforce that characters just outside of the match MUST be non-word characters (or there may be NO character on either side).

Notice that 2019 is not replaced by agenext birthday because age and next birthday can only be substituted where 19 and 20 are whole "words" hence why 2019 becomes year.

Code: (Demo)

$lookup = [
    'Name'  => 'Joseph',
    'Name_en' => 'Joe',
    'Name_es' => 'Jose',
    '19' => 'age',
    '2019' => 'year',
    '20' => 'next birthday',
];
$str = "Name : Name_en, 2019 19 20";

$subpattern = implode('|', array_map('preg_quote', array_keys($lookup)));
echo preg_replace_callback(
    "#\b($subpattern)\b#",
    fn($m) => $lookup[$m[0]],
    $str
);
// Joseph : Joe, year age next birthday

If you don't want to go through the toil of building a subpattern, then you can simply search for whole words and check each encounter. If there is no match in the translation array, then just replace the encounter with its original self.

Code: (Demo)

$str = "Name : Name_en, outlier 2019 19 20";

echo preg_replace_callback(
    "#\b\w+\b#",
    fn($m) => $lookup[$m[0]] ?? $m[0],
    $str
);

As for your misunderstanding with your provided example, a : is not an alphanumeric or underscore character -- it represents a "non-word" character. When you write a \b before :, then that word boundary marker enforces that the character before the colon MUST be a "word" character. If any boundary is desired at all, it will be a non-word boundary. Here's a demonstration: (Demo)

$needle = ':Name';
$str = ":Name :Name_en";

echo "Leading word boundary\t\t" . preg_replace("#\b$needle\b#", '"Test"', $str);
echo "\n---\n";
echo "No leading word boundary\t" . preg_replace("#$needle\b#", '"Test"', $str);
echo "\n---\n";
echo "Leading non-word boundary\t" . preg_replace("#\B$needle\b#", '"Test"', $str);

Output:

Leading word boundary       :Name :Name_en
---
No leading word boundary    "Test" :Name_en
---
Leading non-word boundary   "Test" :Name_en

Callahan answered 20/5, 2024 at 22:29 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags