Brief
Input Corrections
First of all, your input is incorrect as there's an extra parenthesis (as shown below)
(F(i(r(s)t))) ((S)(e)((c)(o))n)d) (((((((Third)))))))
^
Making appropriate modifications to either include or exclude the additional parenthesis, one might end up with one of the following strings:
Extra parenthesis removed
(F(i(r(s)t))) ((S)(e)((c)(o))n)d (((((((Third)))))))
^
Additional parenthesis added to match extra closing parenthesis
((F(i(r(s)t))) ((S)(e)((c)(o))n)d) (((((((Third)))))))
^
Regex Capabilities
Second of all, this is really only truly possible in regex flavours that include the recursion capability since any other method will not properly match opening/closing brackets (as seen in the OP's solution, it matches the extra parenthesis from the incorrect input as noted above).
This means that for regex flavours that do not currently support recursion (Java, Python, JavaScript, etc.), recursion (or attempts at mimicking recursion) in regular expressions is not possible.
Input
Considering the original input is actually invalid, we'll use the following inputs to test against.
(F(i(r(s)t))) ((S)(e)((c)(o))n)d) (((((((Third)))))))
(F(i(r(s)t))) ((S)(e)((c)(o))n)d (((((((Third)))))))
((F(i(r(s)t))) ((S)(e)((c)(o))n)d) (((((((Third)))))))
Testing against these inputs should yield the following results:
- INVALID (no match)
- VALID (match)
- VALID (match)
Code
There are multiple ways of matching nested groups. The solutions provided below all depend on regex flavours that include recursion capabilities (e.g. PCRE).
See regex in use here
Using DEFINE block
(?(DEFINE)
(?<value>[^()\r\n]+)
(?<groupVal>(?&group)|(?&value))
(?<group>(?&value)*\((?&groupVal)\)(?&groupVal)*)
)
^(?&group)$
Note: This regex uses the flags gmx
Without DEFINE block
See regex in use here
^(?<group>
(?<value>[^()\r\n]+)*
\((?<groupVal>(?&group)|(?&value))\)
(?&groupVal)*
)$
Note: This regex uses the flags gmx
Without x modifier (one-liner)
See regex in use here
^(?<group>(?<value>[^()\r\n]+)*\((?<groupVal>(?&group)|(?&value))\)(?&groupVal)*)$
Without named (groups & references)
See regex in use here
^(([^()\r\n]+)*\(((?1)|(?2))\)(?3)*)$
Note: This is the shortest possible method that I could come up with.
Explanation
I'll explain the last regex as it's a simplified and minimal example of all the other regular expressions above it.
^
Assert position at the start of the line
(([^()\r\n]+)*\(((?1)|(?2))\)(?3)*)
Capture the following into capture group 1
([^()\r\n]+)*
Capture the following into capture group 2 any number of times
[^()\r\n]+
Match any character not present in the set ()\r\n
one or more times
\(
Match a left/opening parenthesis character (
literally
((?1)|(?2))
Capture either of the following into capture group 3
(?1)
Recurse the first subpattern (1)
(?2)
Recurse the second subpattern (2)
\)
Match a right/closing parenthesis character )
literally
(?3)*
Recurse the third subpattern (3) any number of times
$
Assert position at the end of the line
since we all know that regular expressions are not supposed to be used to match these things
is utter bologny. – Kylix\((?:[^()]++|(?R))*\)
– Kylix(?(DEFINE)(?<G2>.*)(?<G1>.*\)(?!.*\k<G2>).*))(?=\()(?:(?=.*?\((?!.*?\k<G1>)(?&G1))(?=.*?\)(?!.*?\k<G2>)(?&G2)).)+?.*?(?=\k<G1>)[^(]*(?=\k<G2>$)
but it does not find any. – Kylixboost doesn't support forward references
, I just changed my version so it does. You can try your regex's using it www.regexformat.com. It's a heavily modified boost engine which is now almost Perl. After some testing, it seems your regex exhibits strange behavior when the otherwise normal matches are put on different lines. Example(F(i(r(s)t))) ((S)(e)((c)(o))(n)d) (((((((Third)))))))\n
matches the entire line. – Kylix(F(i(r(s)t)))\n ((S)(e)((c)(o))(n)d) (((((((Third)))))))
matches((S)(e)((c)(o))(n)d)
and(((((((Third)))))))
independently but matches((S)(e)((c)(o))(n)d) (((((((Third)))))))
if there are any newlines after it, and always the last line. If you change to use Dot-All it matches any independently and works as it should. To me, this behavior is typical of unbounded assertions and is proof of flawed recursive simulation. Or, maybe not, but something you should check. – Kylix.*
parts to potentially match line terminators in order for the method to work correctly with multiple lines. I could force this by changing every.
to[\s\S]
in the example (and\2$
to\2\z
to supersede the 'm' modifier), but I chose to keep it as simple as possible since it is already quite difficult to understand. – Legion