Regex anchors inside character class
Asked Answered
G

3

8

Is it possible to use anchors inside a character class? This doesn't work:

analyze-string('abcd', '[\s^]abcd[\s$]') 

It looks like ^ and $ are treated as literal when inside a character class; however, escaping them (\^, \$) doesn't work either.

I'm trying to use this expression to create word boundaries (\b is not available in XSLT/XQuery), but I would prefer not to use groups ((^|\s)) -- since non-capturing groups aren't available, that means in some scenarios I may end up with a large amount of unneeded capture groups, and that creates a new task of finding the "real" capture groups in the set of unneeded ones.

Greatcoat answered 29/5, 2013 at 22:59 Comment(2)
Whoever voted to close this as duplicate, can you provide a link to the alleged duplicate question?Invoke
@Invoke It wasn't me, but this is the alleged dupe: #9623369. Although there are a few commonalities, I think it's a substantially different problem.Greatcoat
I
6

I believe the answer is no, you can't include ^ and $ as anchors in a [], only as literal characters. (I've wished you could do that before too.)

However, you could concat a space on the front and back of the string, then just look for \s as word boundaries and never mind the anchors. E.g.

analyze-string(concat(' ', 'abcd xyz abcd', ' '), '\sabcd\s')

You may also want + after each \s, but that's a separate issue.

Invoke answered 30/5, 2013 at 1:24 Comment(1)
It's a hack, but I'll take it!Greatcoat
P
3

If you're using analyze-string as a function, then presumably you're using a 3.0 implementation of either XSLT or XQuery.

In that case, why do you say "non-capturing groups aren't available"? The XPath Functions and Operators 3.0 spec is explicit that "Non-capturing groups are also recognized. These are indicated by the syntax (?:xxxx)."

Pooh answered 30/5, 2013 at 17:34 Comment(1)
I'm using MarkLogic, which only has a subset of 3.0 implemented.Greatcoat
Z
0

Using the caret after the first square bracket will negate the character class. It essentially gives you the opposite of what you're looking to do, meaning the character class will match any character that is not in the character class. Negated character classes also match (invisible) line break characters.

You could try doing a negative look-ahead possibly.

(?!\s)
Zarf answered 29/5, 2013 at 23:29 Comment(1)
Unfortunately, look-ahead/behind aren't included in regex in the XQuery and XSLT specs. I updated the regex to be more clear - I actually intended to include the anchor in the character class. The goal is to require matching "space OR begin/end-anchor (without capturing)".Greatcoat

© 2022 - 2024 — McMap. All rights reserved.