Replace "whole word" when "word" starts with colon (\b is not working as intended)
Asked Answered
N

6

7

Since str_replace() matches :Name two times in :Name :Name_en, I want to match the results for the whole word only. I wanted to switch to preg_replace() because of this answer.

$str = ":Name :Name_en";
echo $str . chr(10);
$str = preg_replace('/\b' . ':Name' . '\b/i', '"Test"', $str);
echo $str;

But this doesn't work because of the colon. No replacement takes place. How should the regex be adjusted?

\b is the word boundary. But I think a colon doesn't belong to such a word boundary.

Nutt answered 13/3, 2012 at 10:53 Comment(4)
You first need to tell us what your definition of "word" is.Francklyn
For my the whole word is :Name, :Name_en and so on. For RegExp I don't know.Nutt
That's not a definition, it's an example.Francklyn
Definition: It begins with a colon, followed by a string consisting of letters [a-zA-Z], underscore and numbers. It can be terminated by a space or a comma.Nutt
A
11

You don't need the word boundary on the start of your string:

$str = preg_replace('/:Name\b/i', '"Test"', $str);
Anguish answered 13/3, 2012 at 10:56 Comment(3)
Sometimes it's so easy. Do you have a good link where RegExp like /, \b, /i are easily explained?Nutt
Not off-hand :) I thought the php.net documentation was pretty goodAnguish
For those looking for a good link - cheatography.com/davechild/cheat-sheets/regular-expressionsPastoralist
H
1

If you want to replace multiple keywords that are an associative, something like a dictionary or placeholders, you can use this to match your regex pattern:

$words = array(
    "_saudation_" => "Hello",
    "_animal_" => "cat",
    "_animal_sound_" => "MEooow"
);
$source = " _saudation_! My Animal is a _animal_ and it says _animal_sound_ ,  _no_match_";
        
echo preg_replace_callback(
    "/\b_(\w*)_\b/",
    function ($match) use ($words) {
        if (isset($words[$match[0]])) {
            return $words[$match[0]];
        } else {
            return $match[0];
        }
    },
    $source
);
    

Returns: Hello! My Animal is a cat and it says MEooow , _no_match_

Notice, that although "no_match" lacks translation, it will match during regex, but preserve its original value.

Hafiz answered 3/5, 2014 at 0:12 Comment(1)
This allows multiple replacements with different keysHafiz
S
1
function removeCommonWords($input) {

    // EEEEEEK Stop words
    $commonWords = array('a', 'able', 'about', 'above', 'abroad', 'according', 'accordingly', 'across', 'actually', 'adj', 'after', 'afterwards', 'again', 'against', 'ago', 'ahead', 'ain\'t', 'all', 'allow', 'allows', 'almost', 'alone', 'along', 'alongside', 'already', 'also', 'although', 'always', 'am', 'amid', 'amidst', 'among', 'amongst', 'an', 'and', 'another', 'any', 'anybody', 'anyhow', 'anyone', 'anything', 'anyway', 'anyways', 'anywhere', 'apart', 'appear', 'appreciate', 'appropriate', 'are', 'aren\'t', 'around', 'as', 'a\'s', 'aside', 'ask', 'asking', 'associated', 'at', 'available', 'away', 'awfully', 'b', 'back', 'backward', 'backwards', 'be', 'became', 'because', 'become', 'becomes', 'becoming', 'been', 'before', 'beforehand', 'begin', 'behind', 'being', 'believe', 'below', 'beside', 'besides', 'best', 'better', 'between', 'beyond', 'both', 'brief', 'but', 'by', 'c', 'came', 'can', 'cannot', 'cant', 'can\'t', 'caption', 'cause', 'causes', 'certain', 'certainly', 'changes', 'clearly', 'c\'mon', 'co', 'co.', 'com', 'come', 'comes', 'concerning', 'consequently', 'consider', 'considering', 'contain', 'containing', 'contains', 'corresponding', 'could', 'couldn\'t', 'course', 'c\'s', 'currently', 'd', 'dare', 'daren\'t', 'definitely', 'described', 'despite', 'did', 'didn\'t', 'different', 'directly', 'do', 'does', 'doesn\'t', 'doing', 'done', 'don\'t', 'down', 'downwards', 'during', 'e', 'each', 'edu', 'eg', 'eight', 'eighty', 'either', 'else', 'elsewhere', 'end', 'ending', 'enough', 'entirely', 'especially', 'et', 'etc', 'even', 'ever', 'evermore', 'every', 'everybody', 'everyone', 'everything', 'everywhere', 'ex', 'exactly', 'example', 'except', 'f', 'fairly', 'far', 'farther', 'few', 'fewer', 'fifth', 'first', 'five', 'followed', 'following', 'follows', 'for', 'forever', 'former', 'formerly', 'forth', 'forward', 'found', 'four', 'from', 'further', 'furthermore', 'g', 'get', 'gets', 'getting', 'given', 'gives', 'go', 'goes', 'going', 'gone', 'got', 'gotten', 'greetings', 'h', 'had', 'hadn\'t', 'half', 'happens', 'hardly', 'has', 'hasn\'t', 'have', 'haven\'t', 'having', 'he', 'he\'d', 'he\'ll', 'hello', 'help', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'here\'s', 'hereupon', 'hers', 'herself', 'he\'s', 'hi', 'him', 'himself', 'his', 'hither', 'hopefully', 'how', 'howbeit', 'however', 'hundred', 'i', 'i\'d', 'ie', 'if', 'ignored', 'i\'ll', 'i\'m', 'immediate', 'in', 'inasmuch', 'inc', 'inc.', 'indeed', 'indicate', 'indicated', 'indicates', 'inner', 'inside', 'insofar', 'instead', 'into', 'inward', 'is', 'isn\'t', 'it', 'it\'d', 'it\'ll', 'its', 'it\'s', 'itself', 'i\'ve', 'j', 'just', 'k', 'keep', 'keeps', 'kept', 'know', 'known', 'knows', 'l', 'last', 'lately', 'later', 'latter', 'latterly', 'least', 'less', 'lest', 'let', 'let\'s', 'like', 'liked', 'likely', 'likewise', 'little', 'look', 'looking', 'looks', 'low', 'lower', 'ltd', 'm', 'made', 'mainly', 'make', 'makes', 'many', 'may', 'maybe', 'mayn\'t', 'me', 'mean', 'meantime', 'meanwhile', 'merely', 'might', 'mightn\'t', 'mine', 'minus', 'miss', 'more', 'moreover', 'most', 'mostly', 'mr', 'mrs', 'much', 'must', 'mustn\'t', 'my', 'myself', 'n', 'name', 'namely', 'nd', 'near', 'nearly', 'necessary', 'need', 'needn\'t', 'needs', 'neither', 'never', 'neverf', 'neverless', 'nevertheless', 'new', 'next', 'nine', 'ninety', 'no', 'nobody', 'non', 'none', 'nonetheless', 'noone', 'no-one', 'nor', 'normally', 'not', 'nothing', 'notwithstanding', 'novel', 'now', 'nowhere', 'o', 'obviously', 'of', 'off', 'often', 'oh', 'ok', 'okay', 'old', 'on', 'once', 'one', 'ones', 'one\'s', 'only', 'onto', 'opposite', 'or', 'other', 'others', 'otherwise', 'ought', 'oughtn\'t', 'our', 'ours', 'ourselves', 'out', 'outside', 'over', 'overall', 'own', 'p', 'particular', 'particularly', 'past', 'per', 'perhaps', 'placed', 'please', 'plus', 'possible', 'presumably', 'probably', 'provided', 'provides', 'q', 'que', 'quite', 'qv', 'r', 'rather', 'rd', 're', 'really', 'reasonably', 'recent', 'recently', 'regarding', 'regardless', 'regards', 'relatively', 'respectively', 'right', 'round', 's', 'said', 'same', 'saw', 'say', 'saying', 'says', 'second', 'secondly', 'see', 'seeing', 'seem', 'seemed', 'seeming', 'seems', 'seen', 'self', 'selves', 'sensible', 'sent', 'serious', 'seriously', 'seven', 'several', 'shall', 'shan\'t', 'she', 'she\'d', 'she\'ll', 'she\'s', 'should', 'shouldn\'t', 'since', 'six', 'so', 'some', 'somebody', 'someday', 'somehow', 'someone', 'something', 'sometime', 'sometimes', 'somewhat', 'somewhere', 'soon', 'sorry', 'specified', 'specify', 'specifying', 'still', 'sub', 'such', 'sup', 'sure', 't', 'take', 'taken', 'taking', 'tell', 'tends', 'th', 'than', 'thank', 'thanks', 'thanx', 'that', 'that\'ll', 'thats', 'that\'s', 'that\'ve', 'the', 'their', 'theirs', 'them', 'themselves', 'then', 'thence', 'there', 'thereafter', 'thereby', 'there\'d', 'therefore', 'therein', 'there\'ll', 'there\'re', 'theres', 'there\'s', 'thereupon', 'there\'ve', 'these', 'they', 'they\'d', 'they\'ll', 'they\'re', 'they\'ve', 'thing', 'things', 'think', 'third', 'thirty', 'this', 'thorough', 'thoroughly', 'those', 'though', 'three', 'through', 'throughout', 'thru', 'thus', 'till', 'to', 'together', 'too', 'took', 'toward', 'towards', 'tried', 'tries', 'truly', 'try', 'trying', 't\'s', 'twice', 'two', 'u', 'un', 'under', 'underneath', 'undoing', 'unfortunately', 'unless', 'unlike', 'unlikely', 'until', 'unto', 'up', 'upon', 'upwards', 'us', 'use', 'used', 'useful', 'uses', 'using', 'usually', 'v', 'value', 'various', 'versus', 'very', 'via', 'viz', 'vs', 'w', 'want', 'wants', 'was', 'wasn\'t', 'way', 'we', 'we\'d', 'welcome', 'well', 'we\'ll', 'went', 'were', 'we\'re', 'weren\'t', 'we\'ve', 'what', 'whatever', 'what\'ll', 'what\'s', 'what\'ve', 'when', 'whence', 'whenever', 'where', 'whereafter', 'whereas', 'whereby', 'wherein', 'where\'s', 'whereupon', 'wherever', 'whether', 'which', 'whichever', 'while', 'whilst', 'whither', 'who', 'who\'d', 'whoever', 'whole', 'who\'ll', 'whom', 'whomever', 'who\'s', 'whose', 'why', 'will', 'willing', 'wish', 'with', 'within', 'without', 'wonder', 'won\'t', 'would', 'wouldn\'t', 'x', 'y', 'yes', 'yet', 'you', 'you\'d', 'you\'ll', 'your', 'you\'re', 'yours', 'yourself', 'yourselves', 'you\'ve', 'z', 'zero');

    return preg_replace('/\b(' . implode('|', $commonWords) . ')\b/i', '', $input);
}

$s = "It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.";

echo $s . "<br>";
echo removeCommonWords($s);
Schug answered 14/7, 2020 at 22:3 Comment(0)
L
0

If your using PHP 5+ you can still use str_replace().

$str = ":Name :Name_en";
echo $str . chr(10);

// The final int limits the function to a single replace.
$str = str_replace(':Name', '"Test"', $str, 1);

echo $str;

Output: (Demo)

:Name :Name_en

Fatal error: Uncaught Error: str_replace(): Argument #4 ($count) could not be passed by reference
Lema answered 13/3, 2012 at 11:3 Comment(7)
But which item does it replace? The first one, the last one, ...? In my application it could be that this changes ...Nutt
The first one. str_replace is sequential.Lema
That doesn't seem to be what the documentation says: count: If passed, this will be set to the number of replacements performed.. So it is a variable reference, not a constantBestead
@Bestead In the notes "Because str_replace() replaces left to right, it might replace a previously inserted value when doing multiple replacements". I assume relates to search/replace with arrays, so you could be right. I'll have to run some tests when I have a chance and find out if it applies to the 'subject' also.Lema
FWIW, the problem with str_replace is that simple replaces such as replacing 'name' would match in fullname or firstname. this is where regex is a better option since you can use word boundaries.Greenquist
@Greenquist You are correct, but that is why you would use unique tags that wouldn't be used in a normal sentence. ":Name" isn't great but is it basic. Maybe "{{#NAME#}}". I guess it depends on the complexity of the system.Lema
@Greenquist I'll also add I have nothing wrong with regex. Just giving an alternative, as people get regex obsessed. But if it isn't required and your not comfortable with the syntax, better off just using a replace.Lema
F
0

you can use space, to replace its common to find space between sentence words.

$str = ":Name :Name_en";
echo $str;

// The final int limits the function to a single replace.
$str = str_replace(':Name ', 'Test', $str);

echo $str;
Forbidding answered 12/9, 2018 at 7:22 Comment(0)
C
0

Rather than matching all whole words and searching the dictionary for a potential match, use the dictionary to generate a pipe-delimited subpattern for the regex operation. This will eliminate fruitless callback executions.

Because the regex pattern is delimited with #, it is not necessary to add the second parameter to the preg_quote() call. The same convenience is not afforded when the pattern is delimited with / because # is escaped by default by preg_quote() but / is not.

/b is a word boundary marker. These are places on either side of the subpattern. As long as the search terms begin and end with an alphanumeric character or an underscore, the \b will enforce that characters just outside of the match MUST be non-word characters (or there may be NO character on either side).

Notice that 2019 is not replaced by agenext birthday because age and next birthday can only be substituted where 19 and 20 are whole "words" hence why 2019 becomes year.

Code: (Demo)

$lookup = [
    'Name'  => 'Joseph',
    'Name_en' => 'Joe',
    'Name_es' => 'Jose',
    '19' => 'age',
    '2019' => 'year',
    '20' => 'next birthday',
];
$str = "Name : Name_en, 2019 19 20";

$subpattern = implode('|', array_map('preg_quote', array_keys($lookup)));
echo preg_replace_callback(
    "#\b($subpattern)\b#",
    fn($m) => $lookup[$m[0]],
    $str
);
// Joseph : Joe, year age next birthday

If you don't want to go through the toil of building a subpattern, then you can simply search for whole words and check each encounter. If there is no match in the translation array, then just replace the encounter with its original self.

Code: (Demo)

$str = "Name : Name_en, outlier 2019 19 20";

echo preg_replace_callback(
    "#\b\w+\b#",
    fn($m) => $lookup[$m[0]] ?? $m[0],
    $str
);

As for your misunderstanding with your provided example, a : is not an alphanumeric or underscore character -- it represents a "non-word" character. When you write a \b before :, then that word boundary marker enforces that the character before the colon MUST be a "word" character. If any boundary is desired at all, it will be a non-word boundary. Here's a demonstration: (Demo)

$needle = ':Name';
$str = ":Name :Name_en";

echo "Leading word boundary\t\t" . preg_replace("#\b$needle\b#", '"Test"', $str);
echo "\n---\n";
echo "No leading word boundary\t" . preg_replace("#$needle\b#", '"Test"', $str);
echo "\n---\n";
echo "Leading non-word boundary\t" . preg_replace("#\B$needle\b#", '"Test"', $str);

Output:

Leading word boundary       :Name :Name_en
---
No leading word boundary    "Test" :Name_en
---
Leading non-word boundary   "Test" :Name_en
Callahan answered 20/5 at 22:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.