How do I make a regex with a lookbehind assertion that still works at the start of a string
Asked Answered
E

1

12

I need to emulate the behavior of \b at the start of a string, where I'm adding additional characters to the set that count as a word boundary. Right now I'm using something like:

"(?<=\\W|\\p{InCJKUnifiedIdeographs})foo"

This works as I would like, unless I'm at the start of the string being matched: in which case the assertion fails and I don't get a hit. What I want is the equivalent of match if I'm at the start of the string or foo is preceded by a non-word character or an ideograph. But I can't get the right incantation to support that.

Any thoughts? Or is this impossible?

Thanks in advance.

Extant answered 11/1, 2011 at 17:42 Comment(3)
What do you mean by match if i am at the start of the string? That would capture all strings because all strings have a 'start of string'Martita
It doesn't: if I use the aforementioned regex against the string "foo foobar baz" it will not find 'foo' because the look behind fails.Extant
In most cases, you can get what you want by reversing the condition: (?<![\w\P{InCJKUnifiedIdeographs}]). I'd add it as an answer, but I don't have time to test it.Katharina
F
25
"(?<=^|\\W|\\p{InCJKUnifiedIdeographs})foo"

Just add the start-of-string anchor to the lookbehind conditions.

Fisticuffs answered 11/1, 2011 at 18:3 Comment(5)
Thanks Robert, that works like a charm. Somehow in the various combinations I experimented with I didn't try the most obvious.Extant
Adding a carot leads to error in my case ((?<=^| )is(?= |$) regex101.com/r/vD5iH9/21Shute
@СашкоЛихенко That's a limitation of Python's regex engine. It allows only "fixed width" look-behinds, and the length of ^ (zero/null?/NaN?) is obviously different than ` ` (one).Fisticuffs
@СашкоЛихенко See if using the word-boundary match will work for you, e.g. \bis\b or something like that.Fisticuffs
Thanks, this was not very easy to find in documents... I tried variants like (?<=\\^|\\W... or (?<=(\\^|\\W... etc.Angy

© 2022 - 2024 — McMap. All rights reserved.