Add a symbol after every word in a string
Asked Answered
C

5

9

Given a string, I want an array of strings containing words, each preceded by any non-word characters.

Example input string:

one "two" (three) -four-

The words in the string may be anything, even gibberish, with any amount of punctuation or symbols.

What I would like to see:

array:
one
 "two
" (three
) -four
-

Essentially, for each match the last thing is a word, preceded by anything left over from the previous match.

I will be using this in PHP. I have tried various combinations of preg_match_all() and preg_split(), with patterns containing many variations of "\w", "\b", "[^\w]" and so on.

The Bigger Picture

How can I place a * after each word in the string for searching purposes?

Conference answered 18/2, 2013 at 17:45 Comment(2)
In your what I would like to see part, is the quote after four supposed to be on the next line?Ifc
You are correct, the quotes seem to have been altered when I posted, I have fixed it now, hopefully they got through correctly this time.Conference
I
10

If you just want to add an asterisk after each "word" you could do this:

<?php
$test = 'one "two" (three) -four-';

echo preg_replace('/(\w+)/', "$1*", $test);
?>

http://phpfiddle.org/main/code/8nr-bpb

Ifc answered 18/2, 2013 at 17:53 Comment(1)
This also works splendidly! I will probably use this one as it saves me from subsequently looping through the matches.Conference
W
7

You can use a negative lookahead to split on word boundaries, like this:

$array = preg_split( '/(?!\w)\b/', 'one "two" (three) -four-');

A print_r( $array); gives you the exact output desired:

Array
(
    [0] => one
    [1] =>  "two
    [2] => " (three
    [3] => ) -four
    [4] => -
)
Washwoman answered 18/2, 2013 at 17:50 Comment(0)
W
-1

Here is an example of how to find a word with regex in PHP.

<?php
    $subject = "abcdef";
    $pattern = '/^def/';
    preg_match($pattern, substr($subject, 3), $matches, PREG_OFFSET_CAPTURE);
    print_r($matches);
?>
Wornout answered 18/2, 2013 at 17:52 Comment(1)
This answer appears to completely ignore the question requirements and sample data.Hoey
A
-1

An alternative

[^\w]*(\b\w*\b)?
----- ----------
 |        |
 |        |-> Matches a word 0 or 1 time
 |-> Matches 0 to many characters except [a-zA-Z0-9_]

You need to match!

Admonition answered 18/2, 2013 at 18:0 Comment(1)
This snippet which implies the use of preg_match_all() erroneously returns empty strings in the matches array. 3v4l.org/aW64GHoey
H
-1

"How can I place a * after each word in the string for searching purposes?"

You don't need to capture anything or use backreferences. Match one or more word characters, then forget what you matched with \K. The zero-width position is where you inject your asterisk.

Code: (Demo)

$test = 'one "two" (three) -four-';

echo preg_replace('/\w+\K/', '*', $test);

The advice remains if you want to split the string. Demo

var_export(
    preg_split('/\w+\K/', $test)
);

As an extension of the requirements to show how word boundaries work...

  • var_export(preg_split('/\b/', $test)); will split before and after each sequence of word characters. Demo
  • echo preg_replace('/\b/', '*', $test); will add an asterisks before and after each sequence of word characters. Demo
Hoey answered 1/11, 2023 at 22:17 Comment(1)
This provably correct answer should not have a negative score. I'd love to read the justification for the dv.Hoey

© 2022 - 2024 — McMap. All rights reserved.