Get all nested curly braces

Asked 27/4, 2013 at 23:58 Answered 30/4, 2013 at 8:21

Solved php regex preg-match preg-match-all

It is possible to get all content in nested curly braces from string? For example:

The {quick} brown fox {jumps {over the} lazy} dog

So i need:

quick
over the
jumps {over the} lazy

Better in this sequence, from most nested.

Crosby answered 27/4, 2013 at 23:58 Comment(2)

Build a simple parser or try using recursive regex. – Rounded 28/4, 2013 at 0:4

There's a comment in the PHP docs that may lead you to what you need: us3.php.net/manual/en/regexp.reference.recursive.php#102748 – Jobholder 28/4, 2013 at 2:19

Solution

The regex below will allow you to grab the content of all the nested curly braces. Note that this assumes that the nested curly braces are balanced; otherwise, it is hard to define what the answer should be.

(?=\{((?:[^{}]++|\{(?1)\})++)\})

The result will be in capturing group 1.

DEMO

Note that the order is not as specified in the question, though. The order printed out is defined by the order of appearance of opening curly bracket {, which means that the content of the outer most pair will be printed out first.

Explanation

Ignoring the zero-width positive look-ahead (?=pattern) for now, and let us focus on the pattern inside, which is:

\{((?:[^{}]++|\{(?1)\})++)\}

The part between 2 literal curly braces - ((?:[^{}]++|\{(?1)\})++) will matches 1 or more instances of either:

a non-empty non-curly-brace sequence of characters [^{}]++, or
recursively match a block enclosed by {}, which may contain many other non-curly-brace sequences or other blocks.

The pattern above alone can match text that doesn't contain {}, which we don't need. Therefore, we make sure a match is a block enclosed by {} by the pair of curly braces {} at 2 ends: \{((?:[^{}]++|\{(?1)\})++)\}.

Since we want the content inside the all the nested curly braces, we need to prevent the engine from consuming the text. That's where the use of the zero-width positive look-ahead comes in to play.

It is not very efficient since you will redo the match for the nesting braces, but I doubt there is any other general solution with regex that can handle it efficiently.

Normal code can handle everything efficiently in one pass, and is recommended if you are going to extend your requirement in the future.

Denote answered 28/4, 2013 at 12:1 Comment(1)

This is brilliant. I had to slightly modify it as per my requirement and it helped me resolve my issue. Thank you! – Jayjaycee 24/8, 2022 at 7:12

A simple solution wihtout using regular expression in one pass:

$str = 'The {quick} brown fox {jumps {over the} lazy} dog';

$result = parseCurlyBrace($str);

echo '<pre>' . print_r($result,true) . '</pre>';

function parseCurlyBrace($str) {

  $length = strlen($str);
  $stack  = array();
  $result = array();

  for($i=0; $i < $length; $i++) {

     if($str[$i] == '{') {
        $stack[] = $i;
     }

     if($str[$i] == '}') {
        $open = array_pop($stack);
        $result[] = substr($str,$open+1, $i-$open-1);
     }
  }

  return $result;
}

Freddiefreddy answered 30/4, 2013 at 8:21 Comment(2)

Much better than your deleted solution. +1 – Denote 30/4, 2013 at 8:24

Thank you, I could not sleep well with my previous horror in mind :-) – Megacycle 30/4, 2013 at 8:25

You can try this:

$subject = 'The {quick} brown fox {jumps {over the} lazy} dog';

function nestor($subject) {
    $result = false;    
    preg_match_all('~[^{}]+|\{(?<nested>(?R)*)\}~', $subject, $matches);

    foreach($matches['nested'] as $match) {
        if ($match != "") {
            $result[] = $match;
            $nesty = nestor($match);
            if ($nesty) 
                $result = array_merge($result,$nesty); 
                // $result[]=$nesty; // to preserve the hierarchy
        }
    }
    return $result;
}

print_r(nestor($subject));

The pattern used here matches nested structure but can't capture with a depth greater than 1. It's the reason why the nestor function is recursivly apply on each match.

You can explore an another way with this other pattern with the \G feature:

$subject = 'The {quick} brown fox {jumps {over the}{ fat} lazy} dog';
$pattern = '~[^{}]++|\G\{(?<nested>(?R)*+)\}~';
preg_match_all($pattern, $subject, $matches/*, PREG_SET_ORDER*/);
print_r($matches);

If you look at the result, you can easily determine rules to know the level depth of each element.

Sapient answered 28/4, 2013 at 2:46 Comment(0)

You can do it in a hacky, ugly way as follows:

1) Search for all matches of the regex {([^}]*)}

2) Search for all matches of the regex {([^}]*{[^}]*}[^}]*)} (as you can see, this can be dynamically constructed)

3) Search for all matches of the regex {([^}]*{[^}]*{[^}]*}[^}]*}[^}]*)}... (keep dynamically constructing it larger until you get no matches)

The capture group, signified by the ()s just inside the outer pair of {}s, will allow you to grab only what's in the first capture group instead of the whole regex match.

Stripteaser answered 28/4, 2013 at 0:1 Comment(0)

Solution

Explanation

Recommended topics

Hot tags