Regex: capturing groups within capture groups

Intro

(you can skip to What if... if you get bored with intros)

This question is not directed to VBScript particularly (I just used it in this case): I want to find a solution for general regular expressions usage (editors included).

This started when I wanted to create an adaptation of Example 4 where 3 capture groups are used to split data across 3 cells in MS Excel. I needed to capture one entire pattern and then, within it, capture 3 other patterns. However, in the same expression, I also needed to capture another kind of pattern and again capture 3 other patterns within it (yeah I know... but before pointing the nutjob finger, please finish reading).

I thought first of Named Capturing Groups then I realized that I should not «mix named and numbered capturing groups» since it «is not recommended because flavors are inconsistent in how the groups are numbered».

Then I looked into VBScript SubMatches and «non-capturing» groups and I got a working solution for a specific case:

For Each C In Myrange
    strPattern = "(?:^([0-9]+);([0-9]+);([0-9]+)$|^.*:([0-9]+)\s.*:([0-9]+).*:([a-zA-Z0-9]+)$)"

    If strPattern <> "" Then
        strInput = C.Value

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        Set rgxMatches = regEx.Execute(strInput)

        For Each mtx In rgxMatches
            If mtx.SubMatches(0) <> "" Then
                C.Offset(0, 1) = mtx.SubMatches(0)
                C.Offset(0, 2) = mtx.SubMatches(1)
                C.Offset(0, 3) = mtx.SubMatches(2)
            ElseIf mtx.SubMatches(3) <> "" Then
                C.Offset(0, 1) = mtx.SubMatches(3)
                C.Offset(0, 2) = mtx.SubMatches(4)
                C.Offset(0, 3) = mtx.SubMatches(5)
            Else
                C.Offset(0, 1) = "(Not matched)"
            End If
        Next
    End If
Next

Here's a demo in Rubular of the regex. In these:

124;12;3
my id1:213 my id2:232 my word:ins4yanrgx
:8587459 :18254182540215 :dcpt
0;1;2

It returns the first 2 cells with numbers and the 3^rd with a number or a word. Basically I used a non-capturing group with 2 "parent" patterns ("parents" = broad patterns where I want to detect other sub-patterns). If the 1^st parent pattern has a matching sub-pattern (1^st capture group) then I place its value and the remaining captured groups of this pattern in the 3 cells. If not, I check if the 4^th capture group (belonging to the 2^nd parent pattern) was matched and place the remaining sub-patterns in the same 3 cells.

What if...

Instead of having something like this:

(?:^(\d+);(\d+);(\d+)$|^.*:(\d+)\s.*:(\d+).*:(\w+)$|what(ever))

Something like this could be possible:

(#:^(\d+);(\d+);(\d+)$)|(#:^.*:(\d+)\s.*:(\d+).*:(\w+)$)|(#:what(ever))

Where (#: instead of creating a non-capturing group, would create a "parent" numbered capture group. In this way I could do something similar to Example 4:

C.Offset(0, 1) = regEx.Replace(strInput, "#$1")
C.Offset(0, 2) = regEx.Replace(strInput, "#$2")
C.Offset(0, 3) = regEx.Replace(strInput, "#$3")

It would search parent patterns until it finds a match in a child pattern (the first match would be returned and, ideally, wouldn't search the remaining ones).

Is there something like this already? Or am I missing something entirely from regex that allows to do this?

Other possible variations:

refer to the parent and child pattern directly, e.g.: #2$3 (this would be equivalent of $6 in my example);
create as many capturing groups as necessary within others (I guess it would be more complex but also the most interesting part as well), e.g.: with regex (same syntax) like (#:^_(?:(#:(\d+):\w+-(\d))|(#:\w+:(\d+)-(\d+)))_$)|(#:^\w+:\s+(#:(\w+);\d-(\d+))$) and fetching ##$1 in patterns like:

_123:smt-4_ it would match in: 123
_ott:432-10_ it would match in: 432
yant: special;3-45235 it would match in: special

Please tell me if you noticed any mistakes or flaws in this logic, I will edit asap.

# (?|^(\d+);(\d+);(\d+)$|^.*:(\d+)\s.*:(\d+).*:(\w+)$|what(ever)()()) (?| ^ ( \d+ ) # (1) ; ( \d+ ) # (2) ; ( \d+ ) # (3) $ | ^ .* : ( \d+ ) # (1) \s .* : ( \d+ ) # (2) .* : ( \w+ ) # (3) $ | what ( ever ) # (1) ( ) # (2) ( ) # (3) )

# (?:^(\d+);(\d+);(\d+)$|^.*:(\d+)\s.*:(\d+).*:(\w+)$|what(ever)) (?: ^ ( \d+ ) # (1) ; ( \d+ ) # (2) ; ( \d+ ) # (3) $ | ^ .* : ( \d+ ) # (4) \s .* : ( \d+ ) # (5) .* : ( \w+ ) # (6) $ | what ( ever ) # (7) )

# (#:^(\d+);(\d+);(\d+)$)|(#:^.*:(\d+)\s.*:(\d+).*:(\w+)$)|(#:what(ever)) ( # (1 start) \#: ^ ( \d+ ) # (2) ; ( \d+ ) # (3) ; ( \d+ ) # (4) $ ) # (1 end) | ( # (5 start) \#: ^ .* : ( \d+ ) # (6) \s .* : ( \d+ ) # (7) .* : ( \w+ ) # (8) $ ) # (5 end) | ( # (9 start) \#:what ( ever ) # (10) ) # (9 end)

Intro

What if...

Recommended topics

Hot tags