Regex matching on word boundary OR non-digit
Asked Answered
D

2

6

I'm trying to use a Regex pattern (in Java) to find a sequence of 3 digits and only 3 digits in a row. 4 digits doesn't match, 2 digits doesn't match.

The obvious pattern to me was:

"\b(\d{3})\b"

That matches against many source string cases, such as:

">123<"
" 123-"
"123"

But it won't match against a source string of "abc123def" because the c/1 boundary and the 3/d boundary don't count as a "word boundary" match that the \b class is expecting.

I would have expected the solution to be adding a character class that includes both non-Digit (\D) and the word boundary (\b). But that appears to be illegal syntax.

"[\b\D](\d{3})[\b\D]"

Does anybody know what I could use as an expression that would extract "123" for a source string situation like:

"abc123def"

I'd appreciate any help. And yes, I realize that in Java one must double-escape the codes like \b to \b, but that's not my issue and I didn't want to limit this to Java folks.

Dolor answered 10/4, 2014 at 16:17 Comment(1)
For more information, check out \b:word boundaries (listed under "Anchors") and the whole section on "Lookarounds" in the Stack Overflow Regular Expressions FAQ.Betweentimes
D
12

You should use lookarounds for those cases:

(?<!\d)(\d{3})(?!\d)

This means match 3 digits that are NOT followed and preceded by a digit.

Working Demo

Deyo answered 10/4, 2014 at 16:21 Comment(0)
C
4

Lookarounds can solve this problem, but I personally try to avoid them because not all regex engines fully support them. Additionally, I wouldn't say this issue is complicated enough to merit the use of lookarounds in the first place.

You could match this: (?:\b|\D)(\d{3})(?:\b|\D)

Then return: \1

Or if you're performing a replacement and need to match the entire string: (?:\b|\D)+(\d{3})(?:\b|\D)+

Then replace with: \1

As a side note, the reason \b wasn't working as part of a character class was because within brackets, [\b] actually has a completely different meaning--it refers to a backspace, not a word boundary.

Here's a Working Demo.

Corfu answered 10/4, 2014 at 17:8 Comment(3)
That's a good answer. The odd thing is, I tried that and came up empty handed in my unit tests. I must have munged it up somehow. I'll give it another shot. Thanks!Dolor
Sorry, had to make a slight edit. The second pattern needs to use + instead of * or else it will match the first three digits of a of a string of four digits or more.Corfu
Yeah - I came across that problem since I was actually doing a replacement in my app. Thanks for the update.Dolor

© 2022 - 2024 — McMap. All rights reserved.