Getting match of Group with Asterisk?
Asked Answered
D

2

5

How can I get the content for a group with an asterisk?

For example I'd like to pare a comma separated list, e. g. 1,2,3,4,5.

private static final String LIST_REGEX = "^(\\d+)(,\\d+)*$";
private static final Pattern LIST_PATTERN = Pattern.compile(LIST_REGEX);

public static void main(String[] args) {
    final String list = "1,2,3,4,5";
    final Matcher matcher = LIST_PATTERN.matcher(list);
    System.out.println(matcher.matches());
    for (int i = 0, n = matcher.groupCount(); i < n; i++) {
        System.out.println(i + "\t" + matcher.group(i));
    }
}

And the output is

true
0   1,2,3,4,5
1   1

How can I get every single entry, i. e. 1, 2, 3, ...?

I am searching for a common solution. This is only a demonstrative example.
Please imagine a more complicated regex like ^\\[(\\d+)(,\\d+)*\\]$ to match a list like [1,2,3,4,5]

Deposal answered 15/9, 2014 at 23:19 Comment(1)
For your second example the easiest should be to maybe use regex to get what is between [] and after use split. It will be really less efficient to use regex for that.Threlkeld
E
6

You can use String.split().

for (String segment : "1,2,3,4,5".split(","))
    System.out.println(segment);

Or you can repeatedly capture with assertion:

Pattern pattern = Pattern.compile("(\\d),?");
for (Matcher m = pattern.matcher("1,2,3,4,5");; m.find())
     m.group(1);

For your second example you added you can do a similar match.

for (String segment : "!!!!![1,2,3,4,5] //"
                          .replaceFirst("^\\D*(\\d(?:,\\d+)*)\\D*$", "$1")
                          .split(","))
    System.out.println(segment);

I made an online code demo. I hope this is what you wanted.


how can I get all the matches (zero, one or more) for a arbitary group with an asterisk (xyz)*? [The group is repeated and I would like to get every repeated capture.]

No, you cannot. Regex Capture Groups and Back-References tells why:

The Returned Value for a Given Group is the Last One Captured

Since a capture group with a quantifier holds on to its number, what value does the engine return when you inspect the group? All engines return the last value captured. For instance, if you match the string A_B_C_D_ with ([A-Z]_)+, when you inspect the match, Group 1 will be D_. With the exception of the .NET engine, all intermediate values are lost. In essence, Group 1 gets overwritten each time its pattern is matched.

Engelbert answered 15/9, 2014 at 23:21 Comment(6)
Thanks. For this example is your solution the easiest way, but this is more a common question.Deposal
@Deposal What's the common question? Is it that you want to match digits between commas from a list, or you want to match all digits?Engelbert
the common question is how to deal with groups that have an asterisk: (xyz)*? The definition says, that the regex xyz can appears zero, one or more times. And I'd like to get all the matches. In the special list example above that means, I want to get all these matches: 1, 2, 3,... for (,\\d+)*. I voted +1 because other people may search for this special problem, but I'am not :)Deposal
@Deposal You cannot neutralize the quantifier into capturing. When capturing groups are repeated in a match, only the last match will be remembered. I'm going to update my answer and explain how this is so.Engelbert
None of the examples is generic, and even the second example breaks the OP example. It not only splits "1,2,3,4,5" but it also (incorrectly) splits "12345" and "1,2,3,4,5,". I need something that requires the separators to be present, but only in between elements. I cannot use split() either because my 'values' contain the separator itself. Toy example with three values: "#5,6,#2,-5,#33,2"Tragicomedy
@MarkJeronimus Ask a new question instead of necromancer'ing old threads. You have a clear option, instead of splitting by a seperator that isn't seperating the values, match them with a matcher with the appropriate pattern that matches the structure of your data. I have no idea what structure your data is (looking at your toy example I would categorize it as bad data, without knowing the specs) so you will need to write the correct pattern for your data.Engelbert
U
2

I assume you may be looking for something like the following, this will handle both of your examples.

private static final String LIST_REGEX = "^\\[?(\\d+(?:,\\d+)*)\\]?$";
private static final Pattern LIST_PATTERN = Pattern.compile(LIST_REGEX);

public static void main(String[] args) {
    final String list = "[1,2,3,4,5]";
    final Matcher matcher = LIST_PATTERN.matcher(list);

    matcher.find(); 
    int i = 0;

    String[] vals = matcher.group(1).split(",");

    System.out.println(matcher.matches());
    System.out.println(i + "\t" + matcher.group(1));

    for (String x : vals) {
       i++;
       System.out.println(i + "\t" + x);
    }
}

Output

true
0   1,2,3,4,5
1   1
2   2
3   3
4   4
5   5
Ur answered 15/9, 2014 at 23:24 Comment(1)
I voted +1 because other people may search for this special problem and you solution for it. But I don't accept the answer because I want to know how can I get all the matches (zero, one or more) for a arbitary group with an asterisk (xyz)*Deposal

© 2022 - 2024 — McMap. All rights reserved.