It is possible to get all content in nested curly braces from string? For example:
The {quick} brown fox {jumps {over the} lazy} dog
So i need:
- quick
- over the
- jumps {over the} lazy
Better in this sequence, from most nested.
It is possible to get all content in nested curly braces from string? For example:
The {quick} brown fox {jumps {over the} lazy} dog
So i need:
Better in this sequence, from most nested.
The regex below will allow you to grab the content of all the nested curly braces. Note that this assumes that the nested curly braces are balanced; otherwise, it is hard to define what the answer should be.
(?=\{((?:[^{}]++|\{(?1)\})++)\})
The result will be in capturing group 1.
Note that the order is not as specified in the question, though. The order printed out is defined by the order of appearance of opening curly bracket {
, which means that the content of the outer most pair will be printed out first.
Ignoring the zero-width positive look-ahead (?=pattern)
for now, and let us focus on the pattern inside, which is:
\{((?:[^{}]++|\{(?1)\})++)\}
The part between 2 literal curly braces - ((?:[^{}]++|\{(?1)\})++)
will matches 1 or more instances of either:
[^{}]++
, or{}
, which may contain many other non-curly-brace sequences or other blocks.The pattern above alone can match text that doesn't contain {}
, which we don't need. Therefore, we make sure a match is a block enclosed by {}
by the pair of curly braces {}
at 2 ends: \{((?:[^{}]++|\{(?1)\})++)\}
.
Since we want the content inside the all the nested curly braces, we need to prevent the engine from consuming the text. That's where the use of the zero-width positive look-ahead comes in to play.
It is not very efficient since you will redo the match for the nesting braces, but I doubt there is any other general solution with regex that can handle it efficiently.
Normal code can handle everything efficiently in one pass, and is recommended if you are going to extend your requirement in the future.
A simple solution wihtout using regular expression in one pass:
$str = 'The {quick} brown fox {jumps {over the} lazy} dog';
$result = parseCurlyBrace($str);
echo '<pre>' . print_r($result,true) . '</pre>';
function parseCurlyBrace($str) {
$length = strlen($str);
$stack = array();
$result = array();
for($i=0; $i < $length; $i++) {
if($str[$i] == '{') {
$stack[] = $i;
}
if($str[$i] == '}') {
$open = array_pop($stack);
$result[] = substr($str,$open+1, $i-$open-1);
}
}
return $result;
}
You can try this:
$subject = 'The {quick} brown fox {jumps {over the} lazy} dog';
function nestor($subject) {
$result = false;
preg_match_all('~[^{}]+|\{(?<nested>(?R)*)\}~', $subject, $matches);
foreach($matches['nested'] as $match) {
if ($match != "") {
$result[] = $match;
$nesty = nestor($match);
if ($nesty)
$result = array_merge($result,$nesty);
// $result[]=$nesty; // to preserve the hierarchy
}
}
return $result;
}
print_r(nestor($subject));
The pattern used here matches nested structure but can't capture with a depth greater than 1. It's the reason why the nestor function is recursivly apply on each match.
You can explore an another way with this other pattern with the \G feature:
$subject = 'The {quick} brown fox {jumps {over the}{ fat} lazy} dog';
$pattern = '~[^{}]++|\G\{(?<nested>(?R)*+)\}~';
preg_match_all($pattern, $subject, $matches/*, PREG_SET_ORDER*/);
print_r($matches);
If you look at the result, you can easily determine rules to know the level depth of each element.
You can do it in a hacky, ugly way as follows:
1) Search for all matches of the regex {([^}]*)}
2) Search for all matches of the regex {([^}]*{[^}]*}[^}]*)}
(as you can see, this can be dynamically constructed)
3) Search for all matches of the regex {([^}]*{[^}]*{[^}]*}[^}]*}[^}]*)}
... (keep dynamically constructing it larger until you get no matches)
The capture group, signified by the ()
s just inside the outer pair of {}
s, will allow you to grab only what's in the first capture group instead of the whole regex match.
© 2022 - 2024 — McMap. All rights reserved.