The target structure looks like the following:
検索結果:100,000件
If I use the following regex pattern:
((?<!検索結果:)(?<!次の)(((〇|一|二|三|四|五|六|七|八|九|十|百|千|万|億|兆|京+|[0-90-9]))(,|,|、)?).+((〇|一|二|三|四|五|六|七|八|九|十|百|千|万|億|兆|京|[0-90-9]).+)件)(?!表示)
As you can see, I want to unmatch everything preceded by "検索結果:" & "次の" using this pattern followed by either Arabic numerals or Japanese kanji (Chinese character) numbers. However, the pattern somehow matches up to 4 digits but not 6 digits.
In other words,
次の1000件
works (meaning it doesn't match anything), but
次の5,0000件
gives a partial match ("0000件")
I want to know why up to 4 digits. And ultimately want to find a way to NOT match anything using this regex. I know this regex is a bit messy. Thanks in advance for your feedback!
\p{N}+
? Or the opposite,\P{N}+
? – Yclept"0000"
is preceded by"5,"
, so it's a match. – Fabrianne.+
terms? Which mean "match 1 or more of anything"? – Fabrianne[0-9]
is not forming a word char. Use regex101.com/r/f1SybY/4. Or a bit shorter. Or, for PCRE, even shorter. – Irritated