Regexp Backslash - GNU Emacs Manual says that \<
matches at the beginning of a word, \>
matches at the end of a word, and \b
matches a word boundary. \b
is just as in other non-Emacs regular expressions. But it seems that \<
and \>
are particular to Emacs regular expressions. Are there cases where \<
and \>
are needed instead of \b
? For instance, \bword\b
would match the same as \<word\>
would, and the only difference is that the latter is more readable.
You can get unexpected results if you assume they behave the same..
What can \< and > that \b can do?
The answer is that \<
and\>
are explicit... This end of a word! and only this end!
\b
is general.... Either end of a word will match...
GNU Operators * Word Operators
line="cat dog sky"
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo
line="cat dog sky"
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo
line="cat dog sky "
echo "$line" |sed -n "s/\(.*\)\b\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\>\(.*\)/# |\1|\2|/p"
echo "$line" |sed -n "s/\(.*\)\<\(.*\)/# |\1|\2|/p"
echo
output
# |cat dog |sky|
# |cat dog| sky|
# |cat dog |sky|
# |cat dog |sky|
# |cat dog| sky|
# |cat dog |sky|
# |cat dog sky| |
# |cat dog sky| |
# |cat dog |sky |
It looks to me like \<.*?\>
would match only series of word characters, while \b.*?\b
would match either series of word characters or a series non-word characters, since it can also accept the end of a word, and then the beginning of one. If you force the expression between the two to be a word, they do indeed act the same.
Of course, you could replicate the behavior of \<
and \>
with \b\w
and \w\b
. So I guess the answer is that yes, it's mostly for readability. Then again, isn't that what most escape characters in regular expression are for?
\w
and \d
(not \
itself) can usually be replaced with other characters of a character class, like [0-9]
. –
Pshaw \<.*\>
will match any string bounded by word characters. The .*
is greedy, so matches as many arbitrary characters as possible. To match only individual words, you could use a non-greedy variant: \<.*?\>
–
Closefitting © 2022 - 2024 — McMap. All rights reserved.
\<
and\>
are from the original vi, and remain there to this day. – Dreiser