Emacs regular expression: what \< and \> can do that \b cannot do?
Asked Answered
S

2

12

Regexp Backslash - GNU Emacs Manual says that \< matches at the beginning of a word, \> matches at the end of a word, and \b matches a word boundary. \b is just as in other non-Emacs regular expressions. But it seems that \< and \> are particular to Emacs regular expressions. Are there cases where \< and \> are needed instead of \b? For instance, \bword\b would match the same as \<word\> would, and the only difference is that the latter is more readable.

Spectroscope answered 30/4, 2011 at 19:31 Comment(2)
They’re also in GNU Grep and in Vim.Alcides
\< and \> are from the original vi, and remain there to this day.Dreiser
R
15

You can get unexpected results if you assume they behave the same..
What can \< and > that \b can do?
The answer is that \< and\> are explicit... This end of a word! and only this end!
\bis general.... Either end of a word will match...

GNU Operators * Word Operators

line="cat dog sky"  
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo
line="cat  dog  sky"  
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo
line="cat  dog  sky  "  
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo

output

# |cat dog |sky|
# |cat dog| sky|
# |cat dog |sky|

# |cat  dog  |sky|
# |cat  dog|  sky|
# |cat  dog  |sky|

# |cat  dog  sky|  |
# |cat  dog  sky|  |
# |cat  dog  |sky  |
Rubberize answered 30/4, 2011 at 23:21 Comment(0)
P
2

It looks to me like \<.*?\> would match only series of word characters, while \b.*?\b would match either series of word characters or a series non-word characters, since it can also accept the end of a word, and then the beginning of one. If you force the expression between the two to be a word, they do indeed act the same.

Of course, you could replicate the behavior of \< and \> with \b\w and \w\b. So I guess the answer is that yes, it's mostly for readability. Then again, isn't that what most escape characters in regular expression are for?

Pshaw answered 30/4, 2011 at 20:5 Comment(3)
The Escape char `\` is never for readability. It is used to differentiate a regex operator from a literal character of the same glyphRubberize
@fred - What I meant was that the escaped characters such as \w and \d (not \ itself) can usually be replaced with other characters of a character class, like [0-9].Pshaw
Daniel: \<.*\> will match any string bounded by word characters. The .* is greedy, so matches as many arbitrary characters as possible. To match only individual words, you could use a non-greedy variant: \<.*?\>Closefitting

© 2022 - 2024 — McMap. All rights reserved.