How to do a non-greedy match in grep?

Asked 12/6, 2010 at 4:43 Answered 24/3, 2020 at 13:12

Solved regex shell command-line grep regex-greedy

239

I want to grep the shortest match and the pattern should be something like:

<car ... model=BMW ...>
...
...
...
</car>

... means any character and the input is multiple lines.

Quite answered 12/6, 2010 at 4:43 Comment(1)

https://mcmap.net/q/17499/-regex-match-open-tags-except-xhtml-self-contained-tags#1732454 – Saving 12/6, 2010 at 4:49

387

You're looking for a non-greedy (or lazy) match. To get a non-greedy match in regular expressions you need to use the modifier ? after the quantifier. For example you can change .* to .*?.

By default grep doesn't support non-greedy modifiers, but you can use grep -P to use the Perl syntax.

Unmeant answered 12/6, 2010 at 4:47 Comment(11)

"You will also need the dot all modifier so that the dot matches new lines." This answer is the top result for "grep dot all modifier" ... what is it? – Oestrin 15/9, 2011 at 10:22

eegg: dot all modifier is also known as multiline. It's a modifier that changes the "." match behavior to include newlines (normally it doesn't). There's no such modifier in grep, but there is in pcregrep. – Macassar 7/5, 2012 at 22:1

Correction: In most of the regex flavors that support it, the mode that allows . to match newlines is called DOTALL or single-line mode; Ruby is the only one that calls it multiline. In the other flavors, multiline is the mode that allows the anchors (^ and $) to match at line boundaries. Ruby has no equivalent mode because in Ruby they always work that way. – Tricho 8/9, 2012 at 21:40

-P was a complete new one on me, I've been happily grepping away for years, and only using -E ... so many wasted years! - Note to self: Re-read Man pages as a (even more!) regular thing, you never digest enough switches and options. – Remanent 15/8, 2013 at 2:43

On some platforms (like Mac OS X) grep does not support -P, but if you use egrep you can use the .*? pattern to achieve the same result. egrep -o 'start.*?end' text.html – Deuteronomy 21/2, 2014 at 16:5

As an extension to @Deuteronomy comment, Mac OS X does not support -P but -E would call egrep hence the suggested .*? works just fine. – Liquidate 15/12, 2014 at 7:12

This answer and comments have so much useful info. Thanks. – Avron 12/2, 2015 at 19:16

I can't tell the difference of using .* or .*?, until I use the -o option – Porcine 1/4, 2016 at 7:7

man grep says: This is highly experimental and grep -P may warn of unimplemented features. Why not use Perl itself? Something like perl -ne 'print if /match/' – Munich 2/5, 2017 at 10:59

This worked perfectly on OSX but sadly doesn't (fails silently - continues to greedy match) on alpine linux alpine:3.7. Does anyone know of an alternative? – Ardell 21/8, 2018 at 16:10

I use MacOS. I have both grep and ggrep (ggrep installed with Homebrew). By running these 4 commands, you can see why I prefer using ggrep -P vs grep -E. echo "part_1:part_2:" | grep -E '^.*?:' --color=always, echo "part_1:part_2:" | grep -E -o '^.*?:' --color=always, echo "part_1:part_2:" | ggrep -P '^.*?:' --color=always, echo "part_1:part_2:" | ggrep -P -o '^.*?:' --color=always. – Fanya 9/7, 2020 at 22:38

Actualy the .*? only works in perl. I am not sure what the equivalent grep extended regexp syntax would be. Fortunately you can use perl syntax with grep so grep -P would work but grep -E which is same as egrep would not work (it would be greedy).

Whitechapel answered 25/4, 2011 at 1:26 Comment(5)

grep -P does not work in GNU grep 2.9 -- just tried it (it doesnt error, just silently doesn't apply the ?. Intertestly neither does the not class eg: env|grep '[^\=]*\=' – Gooding 24/10, 2011 at 22:56

There's no grep -P option or pgrep command in Darwin/OS X 10.8 Mountain Lion, but egrep works great. – Front 16/8, 2013 at 19:13

There's a pgrep command on my OS X 10.9 box, but it's a completely different program whose purpose is to "find or signal processes by name". – Tatyanatau 11/7, 2014 at 13:23

@robertotomás Responding to a 6-year old comment here, but....I thought this as well and then realized I was getting multiple non-greedy matches. For instance, on a color terminal you can see that ` echo "bbbbb" | grep -P 'b.*?b'` returns 2 matches. – Deliverance 1/11, 2017 at 0:27

In 2023 on macOS Monterey 12.6 grep -E worked, made it so it uses the perl regex format. – Reign 5/1, 2023 at 21:39

`grep`

For non-greedy match in grep you could use a negated character class. In other words, try to avoid wildcards.

For example, to fetch all links to jpeg files from the page content, you'd use:

grep -o '"[^" ]\+.jpg"'

To deal with multiple line, pipe the input through xargs first. For performance, use ripgrep.

Harp answered 8/5, 2015 at 18:53 Comment(1)

Never thought use it like this. Works for me. – Hermineherminia 11/1, 2022 at 17:15

My grep that works after trying out stuff in this thread:

echo "hi how are you " | grep -shoP ".*? "

Just make sure you append a space to each one of your lines

(Mine was a line by line search to spit out words)

Easterly answered 27/9, 2012 at 19:2 Comment(2)

-shoP nice mnemonic :) – Boardwalk 28/10, 2016 at 10:38

echo "bbbbb" | grep -shoP 'b.*?b' is a little bit of a learning experience. Only thing that worked for me in terms of explicitly lazy as well. – Deliverance 1/11, 2017 at 0:22

Sorry I am 9 years late, but this might work for the viewers in 2020.

So suppose you have a line like "Hello my name is Jello". Now you want to find the words that start with 'H' and end with 'o', with any number of characters in between. And we don't want lines we just want words. So for that we can use the expression:

grep "H[^ ]*o" file

This will return all the words. The way this works is that: It will allow all the characters instead of space character in between, this way we can avoid multiple words in the same line.

Now you can replace the space character with any other character you want. Suppose the initial line was "Hello-my-name-is-Jello", then you can get words using the expression:

grep "H[^-]*o" file

Trilinear answered 24/3, 2020 at 13:12 Comment(1)

This is merely a regurgitation of kenorb's answer from 2015. – Develop 11/5, 2023 at 11:27

The short answer is using the next regular expression:

(?s)<car .*? model=BMW .*?>.*?</car>

(?s) - this makes a match across multiline
.*? - matches any character, a number of times in a lazy way (minimal match)

A (little) more complicated answer is:

(?s)<([a-z\-_0-9]+?) .*? model=BMW .*?>.*?</\1>

This will makes possible to match car1 and car2 in the following text

<car1 ... model=BMW ...>
...
...
...
</car1>
<car2 ... model=BMW ...>
...
...
...
</car2>

(..) represents a capturing group
\1 in this context matches the sametext as most recently matched by capturing group number 1

Expiratory answered 13/9, 2013 at 19:17 Comment(1)

You are using multiple Perl-style regex extensions which are not available in plain grep – Develop 11/5, 2023 at 11:25

-2

I know that its a bit of a dead post but I just noticed that this works. It removed both clean-up and cleanup from my output.

> grep -v -e 'clean\-\?up'
> grep --version grep (GNU grep) 2.20

Deflected answered 9/3, 2020 at 8:35 Comment(1)

This seems like an answer to another, trivial, question. – Develop 11/5, 2023 at 11:26

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

grep

Recommended topics

Hot tags

`grep`