In C# code, I am trying to pass chinese characters: " 中文ABC123"
.
When I use alphanumeric in general using "^[a-zA-Z0-9\s]+$"
,
it doesn't pass for "中文ABC123"
and regex validation fails.
What other expressions do I need to add for C#?
In C# code, I am trying to pass chinese characters: " 中文ABC123"
.
When I use alphanumeric in general using "^[a-zA-Z0-9\s]+$"
,
it doesn't pass for "中文ABC123"
and regex validation fails.
What other expressions do I need to add for C#?
To match any letter character from any language use:
\p{L}
If you also want to match numbers:
[\p{L}\p{Nd}]+
\p{L}
... matches a character of the unicode category letter.
it is the short form for [\p{Ll}\p{Lu}\p{Lt}\p{Lm}\p{Lo}]
\p{Ll}
... matches lowercase letters. (abc)
\p{Lu}
... matches uppercase letters. (ABC)
\p{Lt}
... matches titlecase letters.
\p{Lm}
... matches modifier letters.
\p{Lo}
... matches letters without case. (中文)
\p{Nd}
... matches a character of the unicode category decimal digit.
Just replace: ^[a-zA-Z0-9\s]+$
with ^[\p{L}0-9\s]+$
\p
, and treats \w
as "latin word character", so it's trickier there: https://mcmap.net/q/24959/-regular-expression-to-match-non-ascii-characters –
Treasonable \p{Lo}
might capture? –
Akins \w
in .NET: https://mcmap.net/q/233968/-net-regex-what-is-the-word-character-w (note that \w
does not work for all languages if using ECMAScript-compliant behavior –
Roz Thanks to @Andie2302 for pointing to the right way to do it.
In Addition, for many language in the world, it's still has the 'addition character' that require main character to generate it (ex. Thai word 'เก็บ' if use only \p{L} it will display only 'เกบ', you can see that some symbolic will be missing from the word).
That's why only \p{L}
will not work for all foreign language.
So, you need to use code below, to support almost foreign language
\p{L}\p{M}
NOTE:
L stand for 'Letter' (All letter from all language, but does not include the 'Mark')
M stand for 'Mark' (The 'Mark' cannot display alone, it require 'Letter' to display it)
In Addition that you need Number, use code below
\p{N}
NOTE:
N stand for 'Numeric'
Thanks to this website for very useful information
© 2022 - 2024 — McMap. All rights reserved.
\w
(word character) can be used instead of[\p{L}0-9]
. – Treasonable