Capture an indeterminant number of delimited values inside of a square braced placeholder in a string
Asked Answered
B

5

20

I have the following regex:

\[([^ -\]]+)( - ([^ -\]]+))+\]

This match the following successfully:

[abc - def - ghi - jkl]

BUT the match is:

Array
(
    [0] => [abc - def - ghi - jkl]
    [1] => abc
    [2] =>  - jkl
    [3] => jkl
)

What I need is something like this:

Array
(
    [0] => [abc - def - ghi - jkl]
    [1] => abc
    [2] =>  - def
    [3] => def
    [4] =>  - ghi
    [5] => ghi
    [6] =>  - jkl
    [7] => jkl
)

I'm able to do that in C# looking at the groups "captures". How can I do that in PHP?

Bellerophon answered 12/4, 2011 at 23:35 Comment(2)
You realize that the - in the character class specifies a range, and your expression ' -\]' means any character from \x20 to \x5D. Thus [^ -\]] is the same thing as [^ !"#$%&'()*+,\-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\\]]. You need to escape the dash!Camel
Can you show us the regex syntax you would use in C# for this task? From glancing at the docs, the syntax looks pretty much identical to that of the PCRE engine PHP uses. If you had trouble, it'd be interesting dissecting the differences.Etheridge
G
21

This is not the job for the regexp. Match against \[([^\]]*)\], then explode the first capture by the " - ".

<?php                                                                       
  $str = "[abc - def - ghi - jkl]";
  preg_match('/\[([^\]]*)\]/', $str, $re);
  $strs = explode(' - ', $re[1]);
  print_r($strs);
?>
Garrett answered 12/4, 2011 at 23:43 Comment(2)
Yeah you are right, not sure why I was complicating the thing so much. Still I would like to know if this can be done in PHP like in C# (where there is like "groups captures")Bellerophon
Warning: This function was DEPRECATED in PHP 5.3.0, and REMOVED in PHP 7.0.0. link. Alternatives to this function include: **preg_split(), explode(), str_split()Rhombic
P
12

Assuming the tokens in your sample string never contain spaces, and are alphanumeric:

<?php
    $pattern = "/([\w|\d])+/";
    $string = "[abc - 123 - def - 456 - ghi - 789 - jkl]";
    preg_match_all($pattern, $string, $matches);
    print_r($matches[0]);
?>

Output:

Array
(
    [0] => abc
    [1] => 123
    [2] => def
    [3] => 456
    [4] => ghi
    [5] => 789
    [6] => jkl
)
Paloma answered 12/4, 2011 at 23:56 Comment(3)
Yeah, this works too, thanks. But I'm looking to match and replace on a string like this "[a - b - c] [a] [a - b - f]". So [a] doesn't get replaced but the others do. I solved the issue with preg_replace_callback. Thanks anyway!Bellerophon
@carlosdubusm: You should edit your question to include the actual string you're matching against. Otherwise, the answers you get may not work for you. :)Paloma
This answer will not suitably make a dynamic number of captures from a delimited substring between the [ and ] markers. The posted pattern will merely match any word characters. The \d alternation is pointless because \d exists within the range of \w.Gwenngwenneth
W
9

SPL preg_match_all will return regex groups starting on index 1 of the $matches variable. If you want to get only the second group you can use $matches[2] for example.

Syntax:

$matches = array(); 
preg_match_all(\
    '/(He)\w+ (\w+)/', 
    "Hello world\n Hello Sunshine", 
    $matches
); 
var_dump($matches);

Result:

array(3) {
  [0] =>
  array(2) {
    [0] =>
    string(11) "Hello world"
    [1] =>
    string(14) "Hello Sunshine"
  }
  [1] =>
  array(2) {
    [0] =>
    string(2) "He"
    [1] =>
    string(2) "He"
  }
  [2] =>
  array(2) {
    [0] =>
    string(5) "world"
    [1] =>
    string(8) "Sunshine"
  }
}

P.S. This answer is posted for the context of the question title after being directed here by a Google search. This was the information I was interested in when searching for this topic.

Whensoever answered 5/9, 2014 at 0:32 Comment(1)
This answer will not suitably make a dynamic number of captures from a delimited substring between the [ and ] markers.Gwenngwenneth
L
6

To group your matches, use parenthesize. EG:

$string = 'bob';
preg_match('/bob/', $string, $matches);

$matches will be ['bob']

preg_match('/(b)(o)(b)/', $string, $matches);

$matches will be ['bob','b','o','b']

Landers answered 21/4, 2017 at 13:4 Comment(2)
your $matches will actually be ['bob', 'b', 'o', 'b']Cosper
This answer will not suitably make a dynamic number of captures from a delimited substring between the [ and ] markers.Gwenngwenneth
G
0

To match an indeterminant number of delimited values inside of a square-braced placeholder, either match the start of the placeholder and lookahead to validate the remainder of the placeholder or match from where the previous match ended with the \G metacharacter followed by the delimiting substring; then you can just match the sought values.

Code: (Demo)

$text = 'foo [abc - def - ghi - jkl] bar';
$regex = <<<REGEX
/                  
(?:                #start a non-capturing group
   \[              #match a left square brace
   (?=[a-z -]+])   #lookahead for the completion of a valid placeholder expression
   |               #or
   \G(?!^)         #continue from end position of last match and not the start of the string
   \s-\s           #match a whitespace, hyphen then a whitespace
)                  #close the non-capturing group
\K                 #forget any matched characters up to this position
[a-z]+             #match one or more lowercase ascii letters
/x
REGEX;
if (preg_match_all($regex, $text, $match)) {
    var_export($match[0]);
}

Output:

array (
  0 => 'abc',
  1 => 'def',
  2 => 'ghi',
  3 => 'jkl',
)
Gwenngwenneth answered 24/6 at 23:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.