A

21

429

I need a regular expression to select all the text between two outer brackets.

Example:
START_TEXT(text here(possible text)text(possible text(more text)))END_TXT
^ ^

Result:
(text here(possible text)text(possible text(more text)))

Authentic answered 13/2, 2009 at 15:49 Comment(2)

This question is very poor because it's not clear what it's is asking. All of the answers interpreted it differently. @Authentic can you please clarify the question? – Koroseal 17/12, 2012 at 18:25

Answered in this post: #6331565 – Teeterboard 6/12, 2013 at 22:47

L

181

Regular expressions are the wrong tool for the job because you are dealing with nested structures, i.e. recursion.

But there is a simple algorithm to do this, which I described in more detail in this answer to a previous question. The gist is to write code which scans through the string keeping a counter of the open parentheses which have not yet been matched by a closing parenthesis. When that counter returns to zero, then you know you've reached the final closing parenthesis.

Lhary answered 13/2, 2009 at 15:55 Comment(9)

I was toying with this idea but thought I might be able to do it with RegExp. Will go back to my original plan. Thanks everyone – Authentic 13/2, 2009 at 16:25

.NET's implementation has [Balancing Group Definitions msdn.microsoft.com/en-us/library/… which allow this sort of thing. – Fearfully 13/6, 2010 at 4:8

I disagree that regular expressions are the wrong tool for this for a few reasons. 1) Most regular expression implementations have a workable if not perfect solution for this. 2) Often you are trying to find balanced pairs of delimiters in a context where other criteria well suited to regular expressions are also in play. 3) Often you are handing a regular expression into some API that only accepts regular expressions and you have no choice. – Mestas 2/5, 2014 at 3:31

Here's a Javascript implementation of Frank's algorithm – Guiltless 23/11, 2014 at 11:0

Regex is the RIGHT tool for the job. This answer is not right. See rogal111's answer. – Shooter 26/12, 2015 at 2:48

Regex recursion certainly can and should be used in this scenario. – Ensconce 7/3, 2016 at 20:28

Absolutely agree with the answer. Although there are some implementations of recursion in regexp, they are equal to finite-state machines and are not supposted to work with nested structures, but Context Free Grammars do this. Look at Homsky's hierarcy of Formal Grammars. – Landrum 20/4, 2016 at 10:52

Frank is right, Context free grammars cannot be described by regular expressions. That's the key point to this answer. – Medic 18/7, 2017 at 22:7

Language purists are correctly arguing that a Chomsky regular language specifically precludes recursion. That doesn't mean that regexps have to be regular. Sure, the label is kinda wrong, but the syntax of the language is good enough to let people completely address use cases which are otherwise 90% solved by the truly regular subset of modern regexps. – Thimerosal 19/8, 2021 at 4:23

O

272

I want to add this answer for quickreference. Feel free to update.

.NET Regex using balancing groups:

\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\)

Where c is used as the depth counter.

Demo at Regexstorm.com

PCRE using a recursive pattern:

\((?:[^)(]+|(?R))*+\)

Demo at regex101; Or without alternation:

\((?:[^)(]*(?R)?)*+\)

Demo at regex101; Or unrolled for performance:

\([^)(]*+(?:(?R)[^)(]*)*+\)

Demo at regex101; The pattern is pasted at (?R) which represents (?0).

Perl, PHP, Notepad++, R: perl=TRUE, Python: PyPI regex module with (?V1) for Perl behaviour.
(the new version of PyPI regex package already defaults to this → DEFAULT_VERSION = VERSION1)

Ruby using subexpression calls:

With Ruby 2.0 \g<0> can be used to call full pattern.

\((?>[^)(]+|\g<0>)*\)

Demo at Rubular; Ruby 1.9 only supports capturing group recursion:

(\((?>[^)(]+|\g<1>)*\))

Demo at Rubular (atomic grouping since Ruby 1.9.3)

JavaScript API :: XRegExp.matchRecursive

XRegExp.matchRecursive(str, '\\(', '\\)', 'g');

Java: An interesting idea using forward references by @jaytea.

Without recursion up to 3 levels of nesting:
(JS, Java and other regex flavors)

To prevent runaway if unbalanced, with * on innermost [)(] only.

\((?:[^)(]|\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\))*\)

Demo at regex101; Or unrolled for better performance (preferred).

\([^)(]*(?:\([^)(]*(?:\([^)(]*(?:\([^)(]*\)[^)(]*)*\)[^)(]*)*\)[^)(]*)*\)

Demo at regex101; Deeper nesting needs to be added as required.

// JS-Snippet to generate pattern
function generatePattern()
{
  // Set max depth & pattern type
  let d = document.getElementById("maxDepth").value;
  let t = document.getElementById("patternType").value;
  
  // Pattern variants: 0=default, 1=unrolled (more efficient)
  let p = [['\\((?:[^)(]|',')*\\)'], ['\\([^)(]*(?:','[^)(]*)*\\)']];
  
  // Generate and display the pattern
  console.log(p[t][0].repeat(d) + '\\([^)(]*\\)' + p[t][1].repeat(d));
} generatePattern();

Max depth = <input type="text" id="maxDepth" size="1" value="3"> 
<select id="patternType" onchange="generatePattern()">
  <option value="0">default pattern</option>
  <option value="1" selected>unrolled pattern</option>
</select>
<input type="submit" onclick="generatePattern()" value="generate!">

_{Reference - What does this regex mean?}

Optometer answered 8/2, 2016 at 13:37 Comment(17)

When you repeat a group with a possessive quantifier, it's useless to make that group atomic since all backtracking positions in that group are deleted at each repetition. So writing (?>[^)(]+|(?R))*+ is the same than writing (?:[^)(]+|(?R))*+. Same thing for the next pattern. About the unrolled version, you can put a possessive quantifier here: [^)(]*+ to prevent backtracking (in case there's no closing bracket). – Deadeye 24/6, 2019 at 21:3

About the Ruby 1.9 pattern, instead of making the repeated group atomic (that has a limited interest when there are many nested parenthesis (...(..)..(..)..(..)..(..)..)) in the subject string), you can use a simple non-capturing group and enclose all in an atomic group: (?>(?:[^)(]+|\g<1>)*) (this behaves exactly like a possessive quantifier). In Ruby 2.x, the possessive quantifier is available. – Deadeye 24/6, 2019 at 21:13

@CasimiretHippolyte Thank you! I adjusted the PCRE patterns and for Ruby 1.9, do you mean the whole pattern to be like this? Please feel free to update yourself. I understand what you mean, but not sure if there is much improvement. – Optometer 25/6, 2019 at 9:13

Thanks for the JavaScript example that doesn't use recursion. I was able to use this in vbScript which has similar limitations. – Royster 9/9, 2020 at 1:33

Thank you, this answer is amazing, all the effort with possible demos allowed me to pick the one I use (regex101.org). I'm very grateful for your work – Teddy 13/9, 2020 at 17:20

you might want this in combination with other strings as well, for example: if (youWantToLookForEnclosingBrackets) (?<brackets>{([^{}]|(?&brackets))*}) (regex101.com/r/pDvixX/1) – Unlike 27/4, 2021 at 14:15

In case anyone needs a curly bracket version of this for .NET: \{(?>\{(?<c>)|[^{}]+|\}(?<-c>))*(?(c)(?!))\} – Harriman 20/7, 2021 at 19:8

For the recursion, instead of $(?:[^)(]+|(?R))*+$ I would recommend ($(?:[^)(]+|(?1))*+$) (or ?2, ?3, etc, depending on what number group it is). ?R always recurses back to the very beginning of the expression. Which, if you're using this alone, is fine. But for example, if you're finding logical comparisons following an if statement if $(?:[^)(]+|(?R))*+$ won't match anything because the if would also have to be repeated to match, not just the parentheses. if ($(?:[^)(]+|(?1))*+$) however, will only check for if once and then recursively check the first group. – Mcginley 25/4, 2022 at 20:52

Hey @Trashman, thanks for your comment! Yes, the php samples which I put, are for the full pattern (?0) else if you need to use this inside like you said, recurse the desired group. – Optometer 25/4, 2022 at 21:21

Thanks @bobblebubble Unfortunately, I just found an issue with my previous comment. If you're trying to reference the group inside the parentheses in a replacement using $1, $2, etc, the numbers keep going up based on the nesting, making them basically unusable. To combat that, I had to put the recursion inside another group and put that inside a branch reset group, so the (hopefully) final expression is: (\s[^#]if) ((?|$([^)(]+|(?2))*+$))([^{;]*); and then for replacement you would use something like $1 $2{$4;}. $3 will change with each recursion, but $2 will retain all of it. – Mcginley 25/4, 2022 at 21:46

Hey @Mcginley I see. Depending on the requirements, I would also think of using non-capturing groups or even an atomic group which is also non-capuring and drop the possessive quantifier. Could imagine something like (\s[^#]if) ($(?>[^)(]+|(?2))*$)([^{;]*); if you need all 3 captures or just \s[^#]if ($(?>[^)(]+|(?1))*$)[^{;]*; or without capturing the parenthesis: \s[^#]if ($((?>[^)(]+|(?1))*)$)[^{;]*; well, many ideas :) – Optometer 26/4, 2022 at 12:14

@bobblebubble good point. Why capture the 3rd group at all if I throw it out? There's always many ways to skin the same cat with RegEx. – Mcginley 27/4, 2022 at 14:55

I had been using this PCRE solution using ?R but found it does not work with (*SKIP)(*FAIL) where the ?1 method by @Manish does. I created an example here: regex101.com/r/xkVzVP/1 – Royster 29/9, 2022 at 15:14

Javascript example fails for this string: 3+(5+((2+3-(5))+1))+3 – Myrilla 27/10, 2022 at 1:34

@AlbertRenshaw Your string is three levels deep nested, this was only for max two... I just reworked this section to support max 3 levels. See this demo at regex101. – Optometer 25/11, 2022 at 14:47

How do you capture the match itself? I've tried replacing the outermost non-capturing group to a capturing one, I will match the smallest parentheses substring: regex101.com/r/VA17kk/1 – Militant 4/4, 2023 at 13:50

@Militant You can just wrap it into a capturing group: ($(?:[^)(]+|(?R))*+$) – Optometer 5/4, 2023 at 18:26

L

181

Regular expressions are the wrong tool for the job because you are dealing with nested structures, i.e. recursion.

But there is a simple algorithm to do this, which I described in more detail in this answer to a previous question. The gist is to write code which scans through the string keeping a counter of the open parentheses which have not yet been matched by a closing parenthesis. When that counter returns to zero, then you know you've reached the final closing parenthesis.

Lhary answered 13/2, 2009 at 15:55 Comment(9)

I was toying with this idea but thought I might be able to do it with RegExp. Will go back to my original plan. Thanks everyone – Authentic 13/2, 2009 at 16:25

.NET's implementation has [Balancing Group Definitions msdn.microsoft.com/en-us/library/… which allow this sort of thing. – Fearfully 13/6, 2010 at 4:8

I disagree that regular expressions are the wrong tool for this for a few reasons. 1) Most regular expression implementations have a workable if not perfect solution for this. 2) Often you are trying to find balanced pairs of delimiters in a context where other criteria well suited to regular expressions are also in play. 3) Often you are handing a regular expression into some API that only accepts regular expressions and you have no choice. – Mestas 2/5, 2014 at 3:31

Here's a Javascript implementation of Frank's algorithm – Guiltless 23/11, 2014 at 11:0

Regex is the RIGHT tool for the job. This answer is not right. See rogal111's answer. – Shooter 26/12, 2015 at 2:48

Regex recursion certainly can and should be used in this scenario. – Ensconce 7/3, 2016 at 20:28

Absolutely agree with the answer. Although there are some implementations of recursion in regexp, they are equal to finite-state machines and are not supposted to work with nested structures, but Context Free Grammars do this. Look at Homsky's hierarcy of Formal Grammars. – Landrum 20/4, 2016 at 10:52

Frank is right, Context free grammars cannot be described by regular expressions. That's the key point to this answer. – Medic 18/7, 2017 at 22:7

Language purists are correctly arguing that a Chomsky regular language specifically precludes recursion. That doesn't mean that regexps have to be regular. Sure, the label is kinda wrong, but the syntax of the language is good enough to let people completely address use cases which are otherwise 90% solved by the truly regular subset of modern regexps. – Thimerosal 19/8, 2021 at 4:23

G

144

You can use regex recursion:

\(([^()]|(?R))*\)

Glance answered 8/11, 2013 at 16:22 Comment(13)

An example would be really useful here, I can't get this to work for things like "(1, (2, 3)) (4, 5)". – Irreformable 15/10, 2014 at 0:1

@AndyHayden this is because "(1, (2, 3)) (4, 5)" has two groups separated with space. Use my regexp with global flag: /(([^()]|(?R))*)/g. Here is online test: regex101.com/r/lF0fI1/1 – Glance 23/10, 2014 at 9:45

I asked a question about this last week https://mcmap.net/q/25901/-recursive-pattern-in-regex – Irreformable 23/10, 2014 at 17:20

In .NET 4.5 I get the following error for this pattern: Unrecognized grouping construct. – Eclecticism 28/6, 2015 at 0:16

Awesome! This is a great feature of regex. Thank you for being the only one to actually answer the question. Also, that regex101 site is sweet. – Shooter 26/12, 2015 at 2:47

Very nice answer. Notepad++ 6.8.8 supports this. – Recognizance 15/2, 2016 at 21:44

How would your expression with global flag (the one you gave an online test) be used with c++ <regex> ???? I am trying to break a string up into chunks in respect to their parenthesis. – Glasswork 8/4, 2016 at 3:19

@Eclecticism You need PCRE to be able to use the recursive feature in this expression. A more detailed explanation here. – Pyro 14/11, 2016 at 22:45

Good answer - This regex is made vastly more efficient by changing it to: $([^()]+|(?R))*$ – Disinterest 2/11, 2018 at 3:24

As a side note, this is kind of misnamed because a real regex isn't recursive. – Empower 12/12, 2018 at 14:19

No, bobblebubble's answer is the best and the PCRE regex to match nested parentheses given there is more efficient. – Dagoba 14/6, 2019 at 17:10

I had been using this PCRE solution using ?R but found it does not work with (*SKIP)(*FAIL) where the ?1 method by @Manish does. I created an example here: regex101.com/r/xkVzVP/1 – Royster 29/9, 2022 at 15:15

@AndyHayden it works for me, with PCRE, check this demo. – Cleavable 15/1 at 9:15

F

34

[^\(]*(\(.*\))[^\)]*

[^$]* matches everything that isn't an opening bracket at the beginning of the string, (\(.*$) captures the required substring enclosed in brackets, and [^\)]* matches everything that isn't a closing bracket at the end of the string. Note that this expression does not attempt to match brackets; a simple parser (see dehmann's answer) would be more suitable for that.

Foscalina answered 13/2, 2009 at 15:51 Comment(3)

the bracket inside the class does not need to be escaped. Since inside it is not a metacharacted. – Pogy 13/2, 2009 at 15:59

This expr fails against something like "text(text)text(text)text" returning "(text)text(text)". Regular expressions can't count brackets. – Marvelofperu 13/2, 2009 at 16:2

@ChristianKlauser this /$([^()]|(?R))*$/g works great, check this demo. – Cleavable 15/1 at 9:20

M

27

This answer explains the theoretical limitation of why regular expressions are not the right tool for this task.

Regular expressions can not do this.

Regular expressions are based on a computing model known as Finite State Automata (FSA). As the name indicates, a FSA can remember only the current state, it has no information about the previous states.

In the above diagram, S1 and S2 are two states where S1 is the starting and final step. So if we try with the string 0110 , the transition goes as follows:

      0     1     1     0
-> S1 -> S2 -> S2 -> S2 ->S1

In the above steps, when we are at second S2 i.e. after parsing 01 of 0110, the FSA has no information about the previous 0 in 01 as it can only remember the current state and the next input symbol.

In the above problem, we need to know the no of opening parenthesis; this means it has to be stored at some place. But since FSAs can not do that, a regular expression can not be written.

However, an algorithm can be written to do this task. Algorithms are generally falls under Pushdown Automata (PDA). PDA is one level above of FSA. PDA has an additional stack to store some additional information. PDAs can be used to solve the above problem, because we can 'push' the opening parenthesis in the stack and 'pop' them once we encounter a closing parenthesis. If at the end, stack is empty, then opening parenthesis and closing parenthesis matches. Otherwise not.

Morbidezza answered 21/9, 2017 at 2:16 Comment(5)

Push and pop are possible in regexp #17004299 regular-expressions.info/balancing.html – Wnw 23/8, 2018 at 19:35

There are several answers here, which prooves, it IS possible. – Cinch 20/9, 2018 at 10:48

@Wnw This answer talks about regular expressions in theoretical perspective. Many regex engines now a days does not only rely on this theoretical model and uses some additional memory to do the job! – Morbidezza 3/6, 2019 at 2:7

@JiříHerník: those are not regular expressions in the strict sense: not defined as regular expressions by Kleene. Some regular expression engines indeed have implemented some extra capabilities, making them parse more than only regular languages. – Benoite 10/6, 2019 at 21:27

This one should be an accepted answer. Unfortunately many "developers" do not have a proper Comp Sc/Eng education and unaware of such topics as Halting problem, Pumping lemma, etc... – Tidal 13/5, 2022 at 23:0

P

26

(?<=\().*(?=\))

If you want to select text between two matching parentheses, you are out of luck with regular expressions. This is impossible^(*).

This regex just returns the text between the first opening and the last closing parentheses in your string.

^(*) Unless your regex engine has features like balancing groups or recursion. The number of engines that support such features is slowly growing, but they are still not a commonly available.

Piece answered 13/2, 2009 at 15:54 Comment(6)

What do the "<=" and "=" signs mean? What regexp engine is this expression targeting? – Marvelofperu 13/2, 2009 at 15:58

This is look-around, or more correctly "zero width look-ahead/look-behind assertions". Most modern regex engines support them. – Piece 13/2, 2009 at 16:1

According to the OP's example, he wants to include the outermost parens in the match. This regex throws them away. – Humility 15/2, 2009 at 5:9

@Alan M: You are right. But according to the question text, he wants everything between the outermost parens. Pick your choice. He said he'd been trying for hours, so didn't even consider "everything including the outermost parens" as the intention, because it is so trivial: "(.*)". – Piece 15/2, 2009 at 10:29

Also, if you allow for recursive regular expressions, this is not "impossible." Adding "impossible" to StackOverflow questions without qualify when it is possible makes for bad reading. I would suggest adding a caveat for recursion, or discussing grammars. – Cumae 12/1, 2015 at 7:47

@ghayes The answer is from 2009. That is a long time ago; regular expression engines that allow some form of recursion have been more uncommon than they are now (and they still are pretty uncommon). I'll mention it in my answer. – Piece 12/1, 2015 at 7:54

M

14

It is actually possible to do it using .NET regular expressions, but it is not trivial, so read carefully.

You can read a nice article here. You also may need to read up on .NET regular expressions. You can start reading here.

Angle brackets <> were used because they do not require escaping.

The regular expression looks like this:

<
[^<>]*
(
    (
        (?<Open><)
        [^<>]*
    )+
    (
        (?<Close-Open>>)
        [^<>]*
    )+
)*
(?(Open)(?!))
>

Magician answered 23/9, 2011 at 18:22 Comment(0)

D

12

I was also stuck in this situation when dealing with nested patterns and regular-expressions is the right tool to solve such problems.

/(\((?>[^()]+|(?1))*\))/

Dews answered 13/6, 2020 at 18:22 Comment(3)

As a user looking for help on a similar topic, I have no idea what that regex does specifically and how I can use it to apply it to my own problem. Perhaps this is a good answer but given the nature of regex being cryptic, I would have to look up every part of it just to see if this would help me. Given that there are so many answers with this type of "solution", I don't think I will. – Braxton 16/3, 2022 at 0:25

regex101.com is a good explainer tool to exlpain this regex. – Disestablish 28/7, 2022 at 13:24

This PCRE solution using ?1 works with (*SKIP)(*FAIL) when the ?R method posted by bobble bubble does not. Use this solution if you need to find things that ARE NOT in parenthesis. I created an example here: regex101.com/r/xkVzVP/1 – Royster 29/9, 2022 at 15:22

W

6

This is the definitive regex:

\(
(?<arguments> 
(  
  ([^\(\)']*) |  
  (\([^\(\)']*\)) |
  '(.*?)'

)*
)
\)

Example:

input: ( arg1, arg2, arg3, (arg4), '(pip' )

output: arg1, arg2, arg3, (arg4), '(pip'

note that the '(pip' is correctly managed as string. (tried in regulator: http://sourceforge.net/projects/regulator/)

Wnw answered 15/5, 2012 at 7:53 Comment(1)

I like this technique if there's no nesting or you only care about the innermost group. It doesn't rely on recursion. I was able to use it to extract an argument that contained parenthesis. I made a working example at Regex101 – Royster 17/10, 2020 at 3:24

B

5

The regular expression using Ruby (version 1.9.3 or above):

/(?<match>\((?:\g<match>|[^()]++)*\))/

Demo on rubular

Blunge answered 21/8, 2013 at 8:38 Comment(0)

A

5

I have written a little JavaScript library called balanced to help with this task. You can accomplish this by doing

balanced.matches({
    source: source,
    open: '(',
    close: ')'
});

You can even do replacements:

balanced.replacements({
    source: source,
    open: '(',
    close: ')',
    replace: function (source, head, tail) {
        return head + source + tail;
    }
});

Here's a more complex and interactive example JSFiddle.

Arronarrondissement answered 2/8, 2014 at 8:15 Comment(0)

C

5

Adding to bobble bubble's answer, there are other regex flavors where recursive constructs are supported.

Lua

Use %b() (%b{} / %b[] for curly braces / square brackets):

for s in string.gmatch("Extract (a(b)c) and ((d)f(g))", "%b()") do print(s) end (see demo)

Raku (former Perl6):

Non-overlapping multiple balanced parentheses matches:

my regex paren_any { '(' ~ ')' [ <-[()]>+ || <&paren_any> ]* }
say "Extract (a(b)c) and ((d)f(g))" ~~ m:g/<&paren_any>/;
# => (｢(a(b)c)｣ ｢((d)f(g))｣)

Overlapping multiple balanced parentheses matches:

say "Extract (a(b)c) and ((d)f(g))" ~~ m:ov:g/<&paren_any>/;
# => (｢(a(b)c)｣ ｢(b)｣ ｢((d)f(g))｣ ｢(d)｣ ｢(g)｣)

See demo.

Python re non-regex solution

See poke's answer for How to get an expression between balanced parentheses.

Java customizable non-regex solution

Here is a customizable solution allowing single character literal delimiters in Java:

public static List<String> getBalancedSubstrings(String s, Character markStart, 
                                 Character markEnd, Boolean includeMarkers) 

{
        List<String> subTreeList = new ArrayList<String>();
        int level = 0;
        int lastOpenDelimiter = -1;
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (c == markStart) {
                level++;
                if (level == 1) {
                    lastOpenDelimiter = (includeMarkers ? i : i + 1);
                }
            }
            else if (c == markEnd) {
                if (level == 1) {
                    subTreeList.add(s.substring(lastOpenDelimiter, (includeMarkers ? i + 1 : i)));
                }
                if (level > 0) level--;
            }
        }
        return subTreeList;
    }
}

Sample usage:

String s = "some text(text here(possible text)text(possible text(more text)))end text";
List<String> balanced = getBalancedSubstrings(s, '(', ')', true);
System.out.println("Balanced substrings:\n" + balanced);
// => [(text here(possible text)text(possible text(more text)))]

Clypeate answered 13/5, 2016 at 10:40 Comment(1)

See an online Java demo for a proof it works with multiple matches. – Dagoba 8/11, 2017 at 13:11

A

2

"""
Here is a simple python program showing how to use regular
expressions to write a paren-matching recursive parser.

This parser recognises items enclosed by parens, brackets,
braces and <> symbols, but is adaptable to any set of
open/close patterns.  This is where the re package greatly
assists in parsing. 
"""

import re


# The pattern below recognises a sequence consisting of:
#    1. Any characters not in the set of open/close strings.
#    2. One of the open/close strings.
#    3. The remainder of the string.
# 
# There is no reason the opening pattern can't be the
# same as the closing pattern, so quoted strings can
# be included.  However quotes are not ignored inside
# quotes.  More logic is needed for that....


pat = re.compile("""
    ( .*? )
    ( \( | \) | \[ | \] | \{ | \} | \< | \> |
                           \' | \" | BEGIN | END | $ )
    ( .* )
    """, re.X)

# The keys to the dictionary below are the opening strings,
# and the values are the corresponding closing strings.
# For example "(" is an opening string and ")" is its
# closing string.

matching = { "(" : ")",
             "[" : "]",
             "{" : "}",
             "<" : ">",
             '"' : '"',
             "'" : "'",
             "BEGIN" : "END" }

# The procedure below matches string s and returns a
# recursive list matching the nesting of the open/close
# patterns in s.

def matchnested(s, term=""):
    lst = []
    while True:
        m = pat.match(s)

        if m.group(1) != "":
            lst.append(m.group(1))

        if m.group(2) == term:
            return lst, m.group(3)

        if m.group(2) in matching:
            item, s = matchnested(m.group(3), matching[m.group(2)])
            lst.append(m.group(2))
            lst.append(item)
            lst.append(matching[m.group(2)])
        else:
            raise ValueError("After <<%s %s>> expected %s not %s" %
                             (lst, s, term, m.group(2)))

# Unit test.

if __name__ == "__main__":
    for s in ("simple string",
              """ "double quote" """,
              """ 'single quote' """,
              "one'two'three'four'five'six'seven",
              "one(two(three(four)five)six)seven",
              "one(two(three)four)five(six(seven)eight)nine",
              "one(two)three[four]five{six}seven<eight>nine",
              "one(two[three{four<five>six}seven]eight)nine",
              "oneBEGINtwo(threeBEGINfourENDfive)sixENDseven",
              "ERROR testing ((( mismatched ))] parens"):
        print "\ninput", s
        try:
            lst, s = matchnested(s)
            print "output", lst
        except ValueError as e:
            print str(e)
    print "done"

Aurify answered 1/9, 2016 at 5:40 Comment(3)

Thank you for this very valuable example code. ChatGPT is still horrible in creating such code. If you don't mind, I would like to publish it as a GitHub gist. The only change I've made to your original code in order to make it more robust is to allow the matching terms as a function parameter (therefore a separate function for recursive calls): gist.github.com/dmikushin/a47e1444c7a32f208db4d8579ce0a13d – Epizoon 4/12, 2023 at 14:37

I know I am being a putz for saying this, but I am uncomfortable with the Phrase "Published by Gene Olson". I think you should take credit by saying "Adapted from ...". – Aurify 6/12, 2023 at 1:11

I would have written your version to use two functions. One would generate the pattern and the dictionary. The other would match the text. I prefer not to repeatedly generate the same pattern in functions that might be called many times. This is especially true in recursive functions. Pattern generation is an expensive operation. – Aurify 6/12, 2023 at 1:32

N

1

The answer depends on whether you need to match matching sets of brackets, or merely the first open to the last close in the input text.

If you need to match matching nested brackets, then you need something more than regular expressions. - see @dehmann

If it's just first open to last close see @Zach

Decide what you want to happen with:

abc ( 123 ( foobar ) def ) xyz ) ghij

You need to decide what your code needs to match in this case.

Neese answered 13/2, 2009 at 15:58 Comment(2)

This is not an answer. – Humility 23/11, 2015 at 5:45

Yes, the demand for a change in the question should be given as a commentary, – Grams 16/12, 2015 at 10:32

I

1

You need the first and last parentheses. Use something like this:

str.indexOf('('); - it will give you first occurrence

str.lastIndexOf(')'); - last one

So you need a string between,

String searchedString = str.substring(str1.indexOf('('),str1.lastIndexOf(')');

Isochronal answered 8/7, 2016 at 14:8 Comment(0)

Z

0

This do not fully address the OP question but I though it may be useful to some coming here to search for nested structure regexp:

Parse parmeters from function string (with nested structures) in javascript

Match structures like:

matches brackets, square brackets, parentheses, single and double quotes

Here you can see generated regexp in action

/**
 * get param content of function string.
 * only params string should be provided without parentheses
 * WORK even if some/all params are not set
 * @return [param1, param2, param3]
 */
exports.getParamsSAFE = (str, nbParams = 3) => {
    const nextParamReg = /^\s*((?:(?:['"([{](?:[^'"()[\]{}]*?|['"([{](?:[^'"()[\]{}]*?|['"([{][^'"()[\]{}]*?['")}\]])*?['")}\]])*?['")}\]])|[^,])*?)\s*(?:,|$)/;
    const params = [];
    while (str.length) { // this is to avoid a BIG performance issue in javascript regexp engine
        str = str.replace(nextParamReg, (full, p1) => {
            params.push(p1);
            return '';
        });
    }
    return params;
};

Zarger answered 2/6, 2019 at 13:58 Comment(0)

V

0

because js regex doesn't support recursive match, i can't make balanced parentheses matching work.

so this is a simple javascript for loop version that make "method(arg)" string into array

push(number) map(test(a(a()))) bass(wow, abc)
$$(groups) filter({ type: 'ORGANIZATION', isDisabled: { $ne: true } }) pickBy(_id, type) map(test()) as(groups)

const parser = str => {
  let ops = []
  let method, arg
  let isMethod = true
  let open = []

  for (const char of str) {
    // skip whitespace
    if (char === ' ') continue

    // append method or arg string
    if (char !== '(' && char !== ')') {
      if (isMethod) {
        (method ? (method += char) : (method = char))
      } else {
        (arg ? (arg += char) : (arg = char))
      }
    }

    if (char === '(') {
      // nested parenthesis should be a part of arg
      if (!isMethod) arg += char
      isMethod = false
      open.push(char)
    } else if (char === ')') {
      open.pop()
      // check end of arg
      if (open.length < 1) {
        isMethod = true
        ops.push({ method, arg })
        method = arg = undefined
      } else {
        arg += char
      }
    }
  }

  return ops
}

// const test = parser(`$$(groups) filter({ type: 'ORGANIZATION', isDisabled: { $ne: true } }) pickBy(_id, type) map(test()) as(groups)`)
const test = parser(`push(number) map(test(a(a()))) bass(wow, abc)`)

console.log(test)

the result is like

[ { method: 'push', arg: 'number' },
  { method: 'map', arg: 'test(a(a()))' },
  { method: 'bass', arg: 'wow,abc' } ]

[ { method: '$$', arg: 'groups' },
  { method: 'filter',
    arg: '{type:\'ORGANIZATION\',isDisabled:{$ne:true}}' },
  { method: 'pickBy', arg: '_id,type' },
  { method: 'map', arg: 'test()' },
  { method: 'as', arg: 'groups' } ]

Vibraharp answered 20/10, 2019 at 11:29 Comment(0)

B

0

While so many answers mention this in some form by saying that regex does not support recursive matching and so on, the primary reason for this lies in the roots of the Theory of Computation.

Language of the form {a^nb^n | n>=0} is not regular. Regex can only match things that form part of the regular set of languages.

Parse parmeters from function string (with nested structures) in javascript

Recommended topics

Hot tags