Perhaps it will be clearer to understand if subpatterns are declared as individual self-describing variables and the pattern is build via interpolation. Also, to eliminate any messy $matches
array clean up, my pattern will only populate fullstring matches (no capture groups) -- this means you only need to access the first element of the matches array.
\K
means "forget the previously matched characters" in other words "restart the fullstring match from here".
\G
means "match from the start of the input string or from the point where the previous match left off".
The lookahead that follows the match of the "subject" of the sentence ensures that only a fully valid "sentence" will qualify.
Code: (Demo)
$tests = [
'package',
'accuracy-is-5',
'accuracy-is-5-or-15',
'accuracy-is-5-or-15-or-20',
'package-is-dip-8-or-dip-4-or-dip-16',
'bad-format',
'bad-format-is-',
'bad-format-is-5-or-',
];
$noun = '(?:dip-)?\d+'; // valid value subpattern
$verb = '-is-'; // literal -is- subpattern
$conjunction = '-or-'; // literal -or- subpattern
$subject = "^[a-z\d-]+"; // match leading word(s)
$predicate = "$verb$noun(?:$conjunction$noun)*$"; // lookahead for the valid remainder of string
$continue = '\G(?!^)'; // continue from point of last match, but not the start of the string
foreach ($tests as $test) {
if (preg_match_all("/(?:$subject(?=$predicate)|$continue(?:$verb|$conjunction)\K$noun)/", $test, $m)) {
echo json_encode($m[0]) . "\n";
}
}
Output:
["accuracy","5"]
["accuracy","5","15"]
["accuracy","5","15","20"]
["package","dip-8","dip-4","dip-16"]