Deep (infinite) split words using regex
Asked Answered
F

1

3

Lets say I have:

$line = "{This is my {sentence|words} I wrote.}"

Output:

This is my sentence I wrote.  
This is my words I wrote.  

But, the regex should match deep and nested values and split these values, for example:

$line = "{This is my {sentence|words} I wrote on a {sunny|cold} day.}";

Output:

This is my sentence I wrote on a sunny day.  
This is my sentence I wrote on a cold day.  
This is my words I wrote on a sunny day.
This is my words I wrote on a cold day.  

My first though was doing it over explode as in code below, but the result was not appropriate:

$res = explode("|", $line);  

Advices? Thank you.

EDIT: Something in these lines:

$line = "{This is my {sentence|words} I wrote on a {sunny|cold} day.}";
$regex = "{[^{}]*}";
$match = [];

preg_match($regex, $line, $match);

var_dump($match);  

As already said, it can go to an infinite so no limit, something in a for-loop appropriate.

Fuqua answered 30/3, 2016 at 13:36 Comment(4)
I think you can create a function that replaces the first match of /{[^{}]*}/ and returns the match and its index... Then, while the return is not -1, you keep exploding by |... Of course you will need an array to push each new sentence (composed by one of the exploded values inserted in the index returned by the function)Ishmaelite
$res = explode("|", $line, 2);Thirtytwomo
I guess the number of nested levels can be any, right? Like $line = "{This is my {sentence|words} I wrote on a {{very|not so} sunny|{freezing|rather} cold} day.}";Monroy
@WiktorStribiżew indeed.Fuqua
N
3

Check this out. I accomplished it by replacing your patterns with %s and using vsprintf, then recursively looping through the matches.

I put a lot of comments in the code...understanding recursion is usually quite a mind job.

Here is a working example.

$line = "{This is my {sentence|statement} I {wrote|typed} on a {hot|cold} {day|night}.}";
$matches = getMatches($line);
printWords([], $matches, $line);


// function to find patterns in the line. Takes $line by reference to replace pattern matches with a vsprintf placeholder
function getMatches(&$line) {
    // remove beginning and trailing brackets on the main sentence
    $line = trim($line, '{}'); 

    // initialize variable that will hold the list of pattern matches
    $matches = null;

    // look for an opening curly brace and skip everything until the ending curly brace
    $pattern = '/\{[^}]+\}/';

    // find all matches and put them in $matches
    preg_match_all($pattern, $line, $matches);

    // preg_match_all nests one level deeper than we need
    $matches = $matches[0];

    // replace all matches with a %s placeholder
    $line = preg_replace($pattern, '%s', $line);

    // split each of the matches by vertical pipe
    foreach ($matches as $index => $match) {
        $matches[$index] = explode('|', trim($match, '{}'));
    }

    return $matches;
}


// recursive function. $args will be used as the second argument to vsprintf
function printWords(array $args, array $matches, $line) {
    // get the first element in the array of $matches, remove it from the array
    $current = array_shift($matches);

    // keep track of the current $args index for this recursive iteration
    $currentArgIndex = count($args);

    // loop through each of the words in the current set of matches
    foreach ($current as $word) {
        // update $args and set the vsprintf argument at this iteration's position to the next word in the set of words
        $args[$currentArgIndex] = $word;

        if (!empty($matches)) {
            // repeat this process (recursively) until we are at the end of the list of matches
            printWords($args, $matches, $line);
        } else {
            // if this is the last match in the line, echo the sentence with all args from previous recursive iterations added
            echo vsprintf($line, $args) . '<br />';
        }
    }
}

Outputs:

This is my sentence I wrote on a hot day.
This is my sentence I wrote on a hot night.
This is my sentence I wrote on a cold day.
This is my sentence I wrote on a cold night.
This is my sentence I typed on a hot day.
This is my sentence I typed on a hot night.
This is my sentence I typed on a cold day.
This is my sentence I typed on a cold night.
This is my statement I wrote on a hot day.
This is my statement I wrote on a hot night.
This is my statement I wrote on a cold day.
This is my statement I wrote on a cold night.
This is my statement I typed on a hot day.
This is my statement I typed on a hot night.
This is my statement I typed on a cold day.
This is my statement I typed on a cold night.
Nucleonics answered 30/3, 2016 at 21:54 Comment(3)
Nice solution, although this doesn't work for nested levels.Shy
@JamesBuck: You are correct. And while the OP did use the word "nested" in his question, I didn't see any examples that suggested he actually needed that, nor can I really think of an example when you would need nested levels for this type of problem. So I figured he didn't really mean "nested".Nucleonics
@Nucleonics I didn't. But I'm still curious for a way on a nested value using regex. For e.g. line like this: { It is { raining { and streets are wet } | snowing { and streets are { slippy | white }}}. Tomorrow will be nice { weather | walk }. }`, and the output should be like this. I can figure out how to deal with space but not with nested level.Fuqua

© 2022 - 2024 — McMap. All rights reserved.