Unix grep regex containing 'x' but not containing 'y'
Asked Answered
P

7

71

I need a single-pass regex for unix grep that contains, say alpha, but does not contain beta.

grep 'alpha' <> | grep -v 'beta'
Politics answered 19/5, 2011 at 18:33 Comment(7)
Please post a sample input and expected output. How do you expect the Not 'y' not to match all lines except 'x' ?. Which is another way of saying you may want a grep 1 pass, but you probably need a grep 2 pass OR awk or perl script for a onepass. Incidentally, that is not my down vote. Maybe someone will explain why this is a bad question?! Good luck.Holladay
I think this is definitely a reasonable question to ask (so +1 from me) especially as I have seen it asked before, and have even asked it myself.Nobile
@shellter: I knew various ways using awk, sed and perl to do it. Even the grep command can do it with a pipe (added a sample line in the question). I just wanted to see if it could be done in one pass. It looks like it can be done (Mr47's answer below) and I got to learn look-ahead and look-behind in perl. It's fun learning new tricks in any language. I don't understand why you think this is a bad question. And I up-voted your answer too. :)Politics
Please re-read my comment. 'That is not my downvote'.. In fact after seeing that you had 2 downvotes, I did give you a vote. I agree with you about learning new techniques. Gotta go. good luck!Holladay
I know you didn't down-vote. It would have been ok even if you did. Was just trying to learn something new.Politics
Arg! Ok... Given your original post, there was no way to assume (except that you wanted one regexp). that you knew about awk/perl AND my real complaint was the lack of sample input and output. ;-) Best wishes! and keep on learning new techniques!Holladay
Agreed. Will be more elaborate next time. Thanks for your time !Politics
V
27

^((?!beta).)*alpha((?!beta).)*$ would do the trick I think.

Valero answered 19/5, 2011 at 18:36 Comment(7)
I'm pretty sure that POSIX grep doesn't support syntax like that!Kohler
I didn't test it, but I'm pretty sure my version of grep supports syntax like this. Could be wrong though.Valero
Could you please explain how the '('s and '?' work here? I am confused why you have 2 '(' in the beginning.Politics
This is a PCRE (Perl-Compatible Regular Expression), so you'll need the -P option for GNU Grep. The (?!...) things are zero-width negative lookahead assertions. I suggest perldoc perlre for an explanation of lookahead assertions.Nobile
Is Perl itself suited for inline things like this, or is there a reason to use a Perl mode in grep over native Perl? Almost everything I find is written in Perl, as in a script or interpreter and not a standalone expression - at that point I could traverse a string upside down and backwards with loops and functions and everything.Conquistador
verified working with grep -P Also,surprisingly, the ^$ is required.Blip
It also works for me, with grep -P optionDynamite
N
56

The other answers here show some ways you can contort different varieties of regex to do this, although I think it does turn out that the answer is, in general, “don’t do that”. Such regular expressions are much harder to read and probably slower to execute than just combining two regular expressions using the boolean logic of whatever language you are using. If you’re using the grep command at a unix shell prompt, just pipe the results of one to the other:

grep "alpha" | grep -v "beta"

I use this kind of construct all the time to winnow down excessive results from grep. If you have an idea of which result set will be smaller, put that one first in the pipeline to get the best performance, as the second command only has to process the output from the first, and not the entire input.

Nobile answered 19/5, 2011 at 20:20 Comment(6)
Yes, but the reason you'd cram all this into a single grep command is usually for use in the tail -f command or something else that uses a data stream that can only be piped into a single command.Lacylad
This solution only works if you're not interested in the context, i.e. it doesn't work well with the -A, -B and -C options that grep has.Fink
This also doesn't work in the important (to me at least) case where the filename might contain the string beta.Higgler
@Tom, try grep -l "alpha" | grep -v "beta". The first grep returns the file names.Gothenburg
@Gothenburg - not sure how that helps? The issue is with a file called beta.py which contain the string alpha. These should be returned in the results but aren't. It could be worked around with `grep "alpha" | grep -v "[*:]+:.*beta" or similar, I guess.Higgler
This does not preserve the nice colors of the first grep for me.Harbor
H
36

Well as we're all posting answers, here it is in awk ;-)

awk '/x/ && !/y/' infile

IHTH.

Holladay answered 19/5, 2011 at 18:41 Comment(4)
You are missing a single quote in front of the !Leyes
@KevinDuke: I think I have it right, awk can process &&d reg-exp mathces. thanks for looking at my answer. Good luck to all.Holladay
Hmm, I think you have it right too, coulda swore it wasn't working the other day. Thanks for your input!Leyes
This works like a charm and is so much easer to read then the grep regex (especially if you want to add more things to match or exclude). Thanks!Citizenry
V
27

^((?!beta).)*alpha((?!beta).)*$ would do the trick I think.

Valero answered 19/5, 2011 at 18:36 Comment(7)
I'm pretty sure that POSIX grep doesn't support syntax like that!Kohler
I didn't test it, but I'm pretty sure my version of grep supports syntax like this. Could be wrong though.Valero
Could you please explain how the '('s and '?' work here? I am confused why you have 2 '(' in the beginning.Politics
This is a PCRE (Perl-Compatible Regular Expression), so you'll need the -P option for GNU Grep. The (?!...) things are zero-width negative lookahead assertions. I suggest perldoc perlre for an explanation of lookahead assertions.Nobile
Is Perl itself suited for inline things like this, or is there a reason to use a Perl mode in grep over native Perl? Almost everything I find is written in Perl, as in a script or interpreter and not a standalone expression - at that point I could traverse a string upside down and backwards with loops and functions and everything.Conquistador
verified working with grep -P Also,surprisingly, the ^$ is required.Blip
It also works for me, with grep -P optionDynamite
S
4

I'm pretty sure this isn't possible with true regular expressions. The [^y]*x[^y]* example would match yxy, since the * allows zero or more non-y matches.

EDIT:

Actually, this seems to work: ^[^y]*x[^y]*$. It basically means "match any line that starts with zero or more non-y characters, then has an x, then ends with zero or more non-y characters".

Sanguinolent answered 19/5, 2011 at 18:38 Comment(0)
H
0

Try using the excludes operator: [^y]*x[^y]*

Hoecake answered 19/5, 2011 at 18:36 Comment(5)
[^y]* matches the string y because there are zero non-y characters in that string.Goodwife
Yeah, so? My example is [^y]*x[^y]*.Hoecake
Note that I answer the question at the same level of abstraction as the question itself.Hoecake
The questioner wants to match strings that contain alpha but not beta. The string alphabeta does not meet the questioner's criterion (it contains the string beta) yet your regular expression will return true because, before the substring alpha, there are zero or more occurrences of the string beta.Goodwife
That depends upon boundary conditions, which the question didn't ask about.Hoecake
H
-1

Q: How to match x but not y in grep without pipe if y is a directory

A: grep x --exclude-dir='y'

Harbor answered 9/12, 2022 at 19:51 Comment(0)
T
-3

Simplest solution:

grep "alpha" * | grep -v "beta"

Please take care of gaps and double quotes.

Thousandth answered 10/4, 2020 at 6:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.