I need to emulate the behavior of \b
at the start of a string, where I'm adding additional characters to the set that count as a word boundary. Right now I'm using something like:
"(?<=\\W|\\p{InCJKUnifiedIdeographs})foo"
This works as I would like, unless I'm at the start of the string being matched: in which case the assertion fails and I don't get a hit. What I want is the equivalent of match if I'm at the start of the string or foo
is preceded by a non-word character or an ideograph. But I can't get the right incantation to support that.
Any thoughts? Or is this impossible?
Thanks in advance.
(?<![\w\P{InCJKUnifiedIdeographs}])
. I'd add it as an answer, but I don't have time to test it. – Katharina