Regex - Repeating Capturing Group
Asked Answered
T

2

19

I'm trying to figure out how I can repeat a capture group on the comma-separated values in this the following url string:

id=1,2;name=user1,user2,user3;city=Oakland,San Francisco,Seattle;zip=94553,94523;

I'm using this RegExp which is return results I want, except for the values since they're dynamic ie. could be 2,3,4,etc users in the url parameter and was wondering if I could create a capture group for each value instead of user1,user2,user3 as one capture-group.

RegExp: (^|;|:)(\w+)=([^;]+)*

Here is a live demo of it online using RegExp

Example Output:

  • Group1 - (semi-colon,colon)
  • Group2 - (key ie. id,name,city,zip)
  • Group3 - (value1)
  • Group4 - (value2) *if exists
  • Group5 - (value3) *if exists
  • Group6 - (value4) *if exists

etc... based on the dynamic values like I explained before.

Question: Whats wrong with my expression I'm using the * to loop for repeated patterns?

Toby answered 17/4, 2017 at 23:50 Comment(6)
What is your expected output? I think this could be done without the use of a regexp.Orthopedics
Do you expect a result like: { "id": ["1", "2"], "name": ["user1", "user2", "user3"], "city": ["Oakland", "San Francisco", "Seattle"], "zip": ["94553", "94523"] }?Orthopedics
@ibrahimmahrir I gave example output above, the values are dynamic like user1,user2,etc... so basically want the each value in it's own capture-groupToby
No! I'm talking about the final output not the output of the regex. How do you want the data to look at the end?Orthopedics
Is this what you trying to do regex101.com/r/2HQ8dv/2Maddux
@Maddux not at all.Toby
D
26

Regex doesn't support what you're trying to do. When the engine enters the capturing group a second time, it overwrites what it had captured the first time. Consider a simple example (thanks regular-expressions.info): /(abc|123)+/ used on 'abc123'. It will match "abc" then see the plus and try again, matching the "123". The final capturing group in the output will be "123".

This happens no matter what pattern you try and any limitation you set simply changes when the regex will accept the string. Consider /(abc|123){2}/. This accepts 'abc123' with the capturing group as "123" but not 'abc123abc'. Putting a capturing group inside another doesn't work either. When you create a capturing group, it's like creating a variable. It can only have one value and subsequent values overwrite the previous one. You'll never be able to have more capturing groups than you have parentheses pairs (you can definitely have fewer, though).

A possible fix then would be to split the string on ';', then each of those on '=', then the right-hand side of those on ','. That would get you [['id', '1', '2'], ['name', 'user1', ...], ['city', ...], ['zip', ...]].

That comes out to be:

function (str) {
  var afterSplit = str.split(';|:');
  afterSplit.pop() // final semicolon creates empty string
  for (var i = 0; i < afterSplit.length; i++) {
    afterSplit[i] = afterSplit[i].split('=');
    afterSplit[i][1] = afterSplit[i][1].split(','); // optionally, you can flatten the array from here to get something nicer
  }
  return afterSplit;
}
Defrayal answered 18/4, 2017 at 4:42 Comment(1)
Although capturing groups don't repeat, in some cases you can simply duplicate the capturing group. For instance say I'm parsing source code and i want to match a class declaration to get the implemented interfaces: Class X implements A, B, C, D. You can create the capture group (?:,\s+([^\s]+))? (matches zero or one time) and repeat it... (?:,\s+([^\s]+))?(?:,\s+([^\s]+))?(?:,\s+([^\s]+))? will now match up to 3 implemented classes. In python its even easier because you can do it like pattern = '(?:,\s+([^\s]+))?' * 3 etc/Provenience
H
3

Capturing Group Repeated

String: !abc123def! regex: /!((abc|123|def)+)!/

Matchs:

Group 1: abc123def

Group 2: def

source: https://www.regular-expressions.info/captureall.html

Harborage answered 28/11, 2019 at 21:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.