Greedy behaviour of grep
Asked Answered
D

1

5

I thought that in regular expressions, the "greediness" applies to quantifiers rather than matches as a whole. However, I observe that

grep -E --color=auto 'a+(ab)?' <(printf "aab")

returns aab rather than aab.

The same applies to sed. On the other hand, in pcregrep and other tools, it is really the quantifier that is greedy. Is this a specific behaviour of grep?

N.B. I checked both grep (BSD grep) 2.5.1-FreeBSD and grep (GNU grep) 3.1

Desiderative answered 2/12, 2019 at 11:14 Comment(3)
Related: Greedy vs. Reluctant vs. Possessive QuantifiersShf
ok, I understood the question better now, echo 'aab' | grep -P 'a+(ab)?' highlights aa whereas echo 'aab' | grep -E 'a+(ab)?' highlights aab meaning it optional ab matched even though it wasn't required.. I think it is because of longest match wins.. for example, echo 'aab' | grep -E 'a+|a+b' highlights aab because that's the longest match whereas echo 'aab' | grep -P 'a+|a+b' highlights aa because in PCRE, alternation precedence is left to right for matches starting from same locationJulesjuley
yes, I had read somewhere about longest match too. But it conflicted "greedy quantifier". I did not realize this was yet another point where POSIX regex and some others differ.Desiderative
R
6

In the description of term matched, POSIX states that

The search for a matching sequence starts at the beginning of a string and stops when the first sequence matching the expression is found, where "first" is defined to mean "begins earliest in the string". If the pattern permits a variable number of matching characters and thus there is more than one such sequence starting at that point, the longest such sequence is matched.

This statement clearly anwers your question. The string aab contains two substrings beginning at the same position matching the ERE a+(ab)?; these are aa and aab. The latter is the longest, thus it's matched.

Rickets answered 2/12, 2019 at 12:7 Comment(3)
So, the statement that quantifiers are greedy by default in regex does not apply to POSIX regex, but only to PCRE based tools?Desiderative
@JosephStack I don't know anything about PCRE, but probably yesRickets
thanks. Many places where I read that "quantifiers are greedy" did not make clear to what regex flavour it applies to...Desiderative

© 2022 - 2024 — McMap. All rights reserved.