'git grep' and word boundaries on Mac OS X and BSD
Asked Answered
L

4

10

I run git grep "\<blah\>" regularly on my linux development server, but I just discovered that I am not able to use \< and \> on Mac (Mac OS X 10.6.8) (not able to use = it does not find anything). Is the regular expressions syntax different in Mac?

I tried using git grep -E "\<blah\>" but to no avail! :-(

Lawful answered 30/9, 2011 at 8:35 Comment(4)
It could be because you're not using the same shell on your mac as the one you use on linux. Maybe the one you use on Mac OS requires you escape your backslashes (e.g. double them).Assurgent
I am able to do git grep ">" under Lion and get lots of matches. Perhaps there is something wrong with your set-up...Edmundedmunda
Try apple.stackexchange.comConvalesce
Doubling the backslashes does not help. Searching for \> just searches for closed angle bracket, instead of end of word boundary. I will try apple.stackexchange.com. Thanks for the link.Lawful
M
11

After struggling with this, too, I found this very helpful post on a BSD mailing list. So here's the (albeit rather ugly) solution:

git grep "[[:<:]]blah[[:>:]]"

The -w flag of git-grep also works but sometimes you want to only match the beginning or end of a word.

Update: This has changed in OS X 10.9 "Mavericks". Now you can use \<, \>, and \b. [[:<:]] and [[:>:]] are no longer supported.

Malamute answered 17/2, 2013 at 21:20 Comment(3)
[[:<:]] and [[:>:]] still seem to work with /usr/bin/grep on macOS Sierra. Are you sure you aren't using /usr/local/bin/grep or some such arrangement?Unsuspecting
Ah. I thought the difference was caused by the underlying (system) regex library. I guess Git might have used this previously on Mac, and now (since the version in Mavericks) uses its own (which is more "normal" in this regard).Unsuspecting
How about relocating the correct answer from the comment section to the actual post?Wrist
F
8

I guess it's caused by the BSD vs Linux grep library.

See if the -w (match pattern only at word boundary) option to git grep does it for you:

$ git grep -w blah
Finance answered 30/9, 2011 at 11:20 Comment(0)
B
5

You can compile git with PCRE support and use git grep -P "\bblah\b" for word boundaries.

Here's a guide on how to compile git using OSX Homebrew: http://realultimateprogramming.blogspot.com/2012/01/how-to-enable-git-grep-p-on-os-x-using.html

Branny answered 26/4, 2013 at 18:29 Comment(1)
this seems to be the default now on macos. at least, i never had to re-compile in order to use -PMoorings
C
0

If you do use -P, make sure to use Git 2.40 (Q1 2023): "grep -P" learned to use Unicode Character Property to grok character classes when processing \b and \w etc.

See commit acabd20 (08 Jan 2023) by Carlo Marcelo Arenas Belón (carenas).
(Merged by Junio C Hamano -- gitster -- in commit 557d93a, 27 Jan 2023)

grep: correctly identify utf-8 characters with \{b,w} in -P

Signed-off-by: Carlo Marcelo Arenas Belón
Acked-by: Ævar Arnfjörð Bjarmason

When UTF is enabled for a PCRE match, the corresponding flags are added to the pcre2_compile() call, but PCRE2_UCP wasn't included.

This prevents extending the meaning of the character classes to include those new valid characters and therefore result in failed matches for expressions that rely on that extention, for ex:

$ git grep -P '\bÆvar'

Add PCRE2_UCP so that \w will include Æ and therefore \b could correctly match the beginning of that word.

This has an impact on performance that has been estimated to be between 20% to 40% and that is shown through the added performance test.

That means those patterns will work, with any character:

'\bhow' 
'\bÆvar'
'\d+ \bÆvar'
'\bBelón\b'
'\w{12}\b'

With Git 2.41 (Q2 2023), a recent-ish change to allow Unicode character classes to be used with "grep -P" triggered a JIT bug in older pcre2 libraries.
The problematic change in Git built with these older libraries has been disabled to work around the bug.

See commit 14b9a04 (23 Mar 2023) by Mathias Krause (mathiaskrause).
(Merged by Junio C Hamano -- gitster -- in commit d35cd54, 30 Mar 2023)

grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34

Reported-by: Stephane Odul
Signed-off-by: Mathias Krause

Stephane is reporting a regression introduced in Git v2.40.0 that leads to 'git grep'(man) segfaulting in his CI pipeline.
It turns out, he's using an older version of libpcre2 that triggers a wild pointer dereference in the generated JIT code that was fixed in PCRE2 10.35.

Instead of completely disabling the JIT compiler for the buggy version, just mask out the Unicode property handling as we used to do prior to commit acabd20 (grep: correctly identify utf-8 characters with {b, 2023-01-08, Git v2.40.0-rc0 -- merge listed in batch #11) ("grep: correctly identify utf-8 characters with \{b,w} in -P").

Cricket answered 28/1, 2023 at 0:6 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.