I want to grep the shortest match and the pattern should be something like:
<car ... model=BMW ...>
...
...
...
</car>
... means any character and the input is multiple lines.
I want to grep the shortest match and the pattern should be something like:
<car ... model=BMW ...>
...
...
...
</car>
... means any character and the input is multiple lines.
You're looking for a non-greedy (or lazy) match. To get a non-greedy match in regular expressions you need to use the modifier ?
after the quantifier. For example you can change .*
to .*?
.
By default grep
doesn't support non-greedy modifiers, but you can use grep -P
to use the Perl syntax.
.
to match newlines is called DOTALL or single-line mode; Ruby is the only one that calls it multiline. In the other flavors, multiline is the mode that allows the anchors (^
and $
) to match at line boundaries. Ruby has no equivalent mode because in Ruby they always work that way. –
Tricho -P
was a complete new one on me, I've been happily grepping away for years, and only using -E
... so many wasted years! - Note to self: Re-read Man pages as a (even more!) regular thing, you never digest enough switches and options. –
Remanent grep
does not support -P
, but if you use egrep
you can use the .*?
pattern to achieve the same result. egrep -o 'start.*?end' text.html
–
Deuteronomy -P
but -E
would call egrep
hence the suggested .*?
works just fine. –
Liquidate .*
or .*?
, until I use the -o
option –
Porcine man grep
says: This is highly experimental and grep -P may warn of unimplemented features.
Why not use Perl itself? Something like perl -ne 'print if /match/'
–
Munich grep
and ggrep
(ggrep installed with Homebrew). By running these 4 commands, you can see why I prefer using ggrep -P
vs grep -E
. echo "part_1:part_2:" | grep -E '^.*?:' --color=always
, echo "part_1:part_2:" | grep -E -o '^.*?:' --color=always
, echo "part_1:part_2:" | ggrep -P '^.*?:' --color=always
, echo "part_1:part_2:" | ggrep -P -o '^.*?:' --color=always
. –
Fanya Actualy the .*?
only works in perl
. I am not sure what the equivalent grep extended regexp syntax would be. Fortunately you can use perl syntax with grep so grep -P
would work but grep -E
which is same as egrep
would not work (it would be greedy).
See also: http://blog.vinceliu.com/2008/02/non-greedy-regular-expression-matching.html
grep -P
does not work in GNU grep 2.9 -- just tried it (it doesnt error, just silently doesn't apply the ?
. Intertestly neither does the not class eg: env|grep '[^\=]*\='
–
Gooding grep -P
option or pgrep
command in Darwin/OS X 10.8 Mountain Lion, but egrep
works great. –
Front pgrep
command on my OS X 10.9 box, but it's a completely different program whose purpose is to "find or signal processes by name". –
Tatyanatau grep -E
worked, made it so it uses the perl regex format. –
Reign grep
For non-greedy match in grep
you could use a negated character class. In other words, try to avoid wildcards.
For example, to fetch all links to jpeg files from the page content, you'd use:
grep -o '"[^" ]\+.jpg"'
To deal with multiple line, pipe the input through xargs
first. For performance, use ripgrep
.
My grep that works after trying out stuff in this thread:
echo "hi how are you " | grep -shoP ".*? "
Just make sure you append a space to each one of your lines
(Mine was a line by line search to spit out words)
-shoP
nice mnemonic :) –
Boardwalk echo "bbbbb" | grep -shoP 'b.*?b'
is a little bit of a learning experience. Only thing that worked for me in terms of explicitly lazy as well. –
Deliverance Sorry I am 9 years late, but this might work for the viewers in 2020.
So suppose you have a line like "Hello my name is Jello"
.
Now you want to find the words that start with 'H'
and end with 'o'
, with any number of characters in between. And we don't want lines we just want words. So for that we can use the expression:
grep "H[^ ]*o" file
This will return all the words. The way this works is that: It will allow all the characters instead of space character in between, this way we can avoid multiple words in the same line.
Now you can replace the space character with any other character you want.
Suppose the initial line was "Hello-my-name-is-Jello"
, then you can get words using the expression:
grep "H[^-]*o" file
The short answer is using the next regular expression:
(?s)<car .*? model=BMW .*?>.*?</car>
A (little) more complicated answer is:
(?s)<([a-z\-_0-9]+?) .*? model=BMW .*?>.*?</\1>
This will makes possible to match car1 and car2 in the following text
<car1 ... model=BMW ...>
...
...
...
</car1>
<car2 ... model=BMW ...>
...
...
...
</car2>
grep
–
Develop I know that its a bit of a dead post but I just noticed that this works. It removed both clean-up and cleanup from my output.
> grep -v -e 'clean\-\?up'
> grep --version grep (GNU grep) 2.20
© 2022 - 2024 — McMap. All rights reserved.