Get repeated matches with preg_match_all()
Asked Answered
H

6

12

I'm trying to get all substrings matched with a multiplier:

$list = '1,2,3,4';
preg_match_all('|\d+(,\d+)*|', $list, $matches);
print_r($matches);

This example returns, as expected, the last match in [1]:

Array
(
    [0] => Array
        (
            [0] => 1,2,3,4
        )

    [1] => Array
        (
            [0] => ,4
        )

)

However, I would like to get all strings matched by (,\d+), to get something like:

Array
(
    [0] => ,2
    [1] => ,3
    [2] => ,4
)

Is there a way to do this with a single function such as preg_match_all()?

Homeric answered 5/7, 2011 at 8:36 Comment(6)
Different language, but same answer as stackoverflow.com/questions/6571106 : you can't, but you can easily split by ,.Riebling
@Kobi: thank you for the link. From what they say, there are solutions in some languages, any hope for PHP or is it a definitive answer?Homeric
[0] => ,2 is not possible with PHP. is ,2 a string or is it a number?Algol
No. As far as I know, PHP has no support for captures of the same group, if you do insist on a whole-regex solution.Riebling
As already suggested: explode(...) is the better option here. You could do: preg_match_all('|(\d+)|', $list, $matches);, but there is no guarantee the input string is a comma delimited string with numbers!Thespian
Thank you Kobi. If you had an answer, I would accept it :-)Homeric
H
7

According to Kobi (see comments above):

PHP has no support for captures of the same group

Therefore this question has no solution.

Homeric answered 21/7, 2011 at 8:58 Comment(0)
D
7

It's true that PHP (or better to say PCRE) doesn't store values of repeated capturing groups for later access (see PCRE docs):

If a capturing subpattern is matched repeatedly, it is the last portion of the string that it matched that is returned.

But in most cases the known token \G does the job. \G 1) matches the beginning of input string (as \A or ^ when m modifier is not set) or 2) starts match from where the previous match ends. Saying that, you have to use it like the following:

preg_match_all('/^\d+|\G(?!^)(,?\d+)\K/', $list, $matches);

See live demo here

or if capturing group doesn't matter:

preg_match_all('/\G,?\d+/', $list, $matches);

by which $matches will hold this (see live demo):

Array
(
    [0] => Array
        (
            [0] => 1
            [1] => ,2
            [2] => ,3
            [3] => ,4
        )

)

Note: the benefit of using \G over the other answers (like explode() or lookbehind solution or just preg_match_all('/,?\d+/', ...)) is that you are able to validate the input string to be only in the desired format ^\d+(,\d+)*$ at the same time while exporting the matches:

preg_match_all('/(?:^(?=\d+(?:,\d+)*$)|\G(?!^),)\d+/', $list, $matches);
Disseminule answered 8/3, 2019 at 13:32 Comment(0)
S
3

Using lookbehind is a way to do the job:

$list = '1,2,3,4';
preg_match_all('|(?<=\d),\d+|', $list, $matches);
print_r($matches);

All the ,\d+ are in group 0.

output:

Array
(
    [0] => Array
        (
            [0] => ,2
            [1] => ,3
            [2] => ,4
        )
)
Santos answered 18/6, 2014 at 7:10 Comment(0)
W
1

Splitting is only an option when the character to split isn't used in the patterns to match itself. I had a situation where a badly formatted comma separated line has to be parsed into any of a number of known options.

i.e. options '1,2', '2', '2,3' subject '1,2,3'.

Splitting on ',' will result in '1', '2', and '3'; only one ('2') of which is a valid match, this happens because the separator is also part of the options.

The naïve regex would be something like '~^(1,2|2|2,3)(?:,(1,2|2|2,3))*$~i', but this runs into the problem of same-group captures.

My "solution" was to just expand the regex to match the maximum number of matches possible: '~^(1,2|2|2,3)(?:,(1,2|2|2,3))?(?:,(1,2|2|2,3))?$~i' (if more options were available, just repeat the '(?:,(1,2|2|2,3))?' bit. This does result in empty string results for "unused" matches.

It's not the cleanest solution, but works when you have to deal with badly formatted input data.

Warmblooded answered 18/9, 2011 at 9:17 Comment(0)
U
0

Why not just:

$ar = explode(',', $list);
print_r($ar);
Unloosen answered 5/7, 2011 at 8:57 Comment(1)
The example above is a simplification, the regexp is actually more complicated than that. I know how to do it the verbose way, I'm just curious to know whether there is a shorter path to the solution.Homeric
P
0

From http://www.php.net/manual/en/regexp.reference.repetition.php :

When a capturing subpattern is repeated, the value captured is the substring that matched the final iteration.

Also similar thread:

How to get all captures of subgroup matches with preg_match_all()?

Pneumonoultramicroscopicsilicovolcanoconiosis answered 17/6, 2014 at 17:19 Comment(1)
These "hints" could have been a comment under the question.Snail

© 2022 - 2024 — McMap. All rights reserved.