grep regex lookahead or start of string (or lookbehind or end of string)
Asked Answered
I

3

6

I want to match a string which may contain a type of character before the match, or the match may begin at the beginning of the string (same for end of string).

For a minimal example, consider the text n.b., which I'd like to match either at the beginning of a line and end of a line or between two non-word characters, or some combination. The easiest way to do this would be to use word boundaries (\bn\.b\.\b), but that doesn't match; similar cases happen for other desired matches with non-word characters in them.

I'm currently using (^|[^\w])n\.b\.([^\w]|$), which works satisfactorily, but will also match the non-word characters (such as dashes) which appear immediately before and after the word, if available. I'm doing this in grep, so while I could easily pipe the output into sed, I'm using grep's --color option, which is disabled when piping into another command (for obvious reasons).

EDIT: The \K option (i.e. (\K^|[^\w])n\.b\.(\K[^\w]|$) seems to work, but it also does discard the color on the match within the output. While I could, again, invoke auxiliary tools, I'd love it if there was a quick and simple solution.

EDIT: I have misunderstood the \K operator; it simply removes all the text from the match preceding its use. No wonder it was failing to color the output.

Ingaborg answered 28/4, 2015 at 3:8 Comment(0)
G
8

If you're using grep, you must be using the -P option, or lookarounds and \K would throw errors. That means you also have negative lookarounds at your disposal. Here's a simpler version of your regex:

(?<!\w)n\.b\.(?!\w)

Also, be aware that (?<=...) and (?<!...) are lookbehinds, and (?=...) and (?!...) are lookaheads. The wording of your title suggests you may have gotten those mixed up, a common beginner's mistake.

Giovannagiovanni answered 28/4, 2015 at 5:3 Comment(1)
I've looked in Linux and OpenBSD implementations of grep and cannot find the -P option. Can you explain it and show it in use and which OS you did it on? EDIT- I found it, it was GNU Grep 3.0.Indic
I
2

Apparently matching beginning of string is possible inside lookahead/lookbehinds; the obvious solution is then (?<=^|[^\w])n\.b\.(?=[^\w]|$).

Ingaborg answered 28/4, 2015 at 3:26 Comment(0)
D
1

This answer addresses the bit regarding losing the effect of --color when piping output from grep:

I'm using grep's --color option, which is disabled when piping into another command (for obvious reasons).

I ran into this problem while attempting to paginate (i.e., with less) grep output and see the color output while paging in less.

Using --color=always always "[s]urround[s] the match...with escape sequences to display them in color...," even when piped.

n.b., when piping --colored output to less, it will display the escape characters, as instructed. Use less -r to see what those escape characters represent.

Example:

grep --color=always pattern [file, ...] | less -r
Dordrecht answered 14/7, 2023 at 7:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.