Regex: how to match an word that doesn't end with a specific character
Asked Answered
U

7

6

I would like to match the whole "word"—one that starts with a number character and that may include special characters but does not end with a '%'.

Match these:

  • 112 (whole numbers)
  • 10-12 (ranges)
  • 11/2 (fractions)
  • 11.2 (decimal numbers)
  • 1,200 (thousand separator)

but not

  • 12% (percentages)
  • A38 (words starting with a alphabetic character)

I've tried these regular expressions:

(\b\p{N}\S)*)

but that returns '12%' in '12%'

(\b\p{N}(?:(?!%)\S)*)

but that returns '12' in '12%'

Can I make an exception to the \S term that disregards %? Or will have to do something else?

I'll be using it in PHP, but just write as you would like and I'll convert it to PHP.

Undersheriff answered 18/11, 2011 at 11:4 Comment(1)
Do these numbers appear in some context? Spaces surrounding etc.? Specifically why did you use the trailing \S (which means non-space)?Lely
D
7

This matches your specification:

\b\p{N}\S*+(?<!%)

Explanation:

\b       # Start of number
\p{N}    # One Digit
\S*+     # Any number of non-space characters, match possessively
(?<!%)   # Last character must not be a %

The possessive quantifier \S*+ makes sure that the regex engine will not backtrack into a string of non-space characters it has already matched. Therefore, it will not "give back" a % to match 12 within 12%.

Of course, that will also match 1!abc, so you might want to be more specific than \S which matches anything that's not a whitespace character.

Demob answered 18/11, 2011 at 11:13 Comment(2)
+1, but if you make the \S* possessive - \S*+ - you can get rid of the (?=\s|$).Biradial
Thanks. This works perfectly. I've extended it a bit to fit some other cases that i discovered. (\b\p{N}\S*+(?<!%|%\p{P}|%\p{P}\p{P})) So it disregards '12%.', '11%-12%', '( 12%)', and '( 12%).' In my data % is followed by a max of two punctations, but i sure the code above can be converted to something more general if needed.Undersheriff
O
1

Can i make an exception to the \S term that disregards %

Yes you can:

[^%\s]

See this expression \b\d[^%\s]* here on Regexr

Ocker answered 18/11, 2011 at 11:12 Comment(4)
But he said that only the last character must not be a %.Demob
@TimPietzcker thats true, then I showed only the way to exclude a specific character from a predefined class, because the correct answer is already there from you (+1)Ocker
Yeah, he asked for that (exclusion of a single character), too. So we don't really know what he actually needs :) +1!Demob
Thanks. I will definitely be using this a lot more.Undersheriff
S
1

KISS (restrictive):

/[0-9][0-9.,-/]*\s/
Skycap answered 18/11, 2011 at 11:18 Comment(2)
That will match 12 in 12%.Demob
I assume the OP is looking for "numbers" so "7ISNOTANUMBER" shouldn't match, ergo there should be some kind of separator (that I assumed to be whitespace).Skycap
V
1
\d+([-/\.,]\d+)?(?!%)

Explanation:

\d+        one or more digits
(
   [-/\.,]     one "-", "/", "." or ","
   \d+         one or more digits
)?         the group above zero or one times
(?!%)      not followed by a "%" (negative lookahead)
Vallombrosa answered 18/11, 2011 at 11:23 Comment(3)
+1. Judging by the examples provided, I suspect this is the answer to the question the OP should have asked. ;)Biradial
@AlanMoore: Careful, this will match 1 in 12%. The quantifiers should also be made possessive. Then it will work.Demob
@Tim: Can I recover from this gaffe by saying I assumed there would be word boundaries or anchors around the regex? :PBiradial
E
0

try this one

preg_match("/^[0-9].*[^%]$/", $string);
Enfield answered 18/11, 2011 at 11:11 Comment(1)
That fails to match single-digit numbers.Demob
A
0

Try this PCRE regex:

/^(\d[^%]+)$/

It should give you what you need.

Antiperistalsis answered 18/11, 2011 at 11:15 Comment(1)
Only the last character cannot be a percent.Orthogenesis
L
0

I would suggest just:

(\b[\p{N},.-]++(?!%))

That's not very exact regarding decimal delimiters or ranges. (As example). But the ++ possessive quantifier will eat up as many decimals as it can. So that you really just need to check the following character with a simple assertion. Did work for your examples.

Lely answered 18/11, 2011 at 11:16 Comment(2)
OP did say the first character had to be a number, and you're not enforcing that. The leading \b doesn't do the trick; it will match (e.g.) the comma in foo,bar.Biradial
Like I said, not very exact. Correct range and fractions treatment has been answered by etuardu.Lely

© 2022 - 2024 — McMap. All rights reserved.