How to validate both Chinese (unicode) and English name?
Asked Answered
B

5

10

I have a multilingual website (Chinese and English).

I like to validate a text field (name field) in javascript. I have the following code so far.

var chkName = /^[characters]{1,20}$/;

if( chkName.test("[name value goes here]") ){
  alert("validated");
}

the problem is, /^[characters]{1,20}$/ only matches English characters. Is it possible to match ANY (including unicode) characters? I used to use the following regex, but I don't want to allow spaces between each characeters.

/^(.+){1,20}$/
Breechloader answered 16/6, 2011 at 19:25 Comment(4)
What do you intend to do if a Korean, Japanese, Vietnamese, or Klingon name is provided?Marlonmarlow
What rules do you have? 1-20 characters, no spaces. Anything else?Pasco
@Russell Borogove // That is my concern as well. I want to validate all the unicodes and english.Breechloader
@Pasco // for now, I want to allow only characters without spaces.Breechloader
G
29

You might check out Javascript + Unicode regexes and do some research to find exactly which ranges of characters you want to allow:

See What's the complete range for Chinese characters in Unicode?

After reading those two and a little extra research you should be able to find appropriate values to complete something like: /^[-'a-z\u4e00-\u9eff]{1,20}$/i

Gio answered 16/6, 2011 at 19:31 Comment(1)
in case ie.: german äüöß, french é..., spanish ñ... should be supported, the regex would need to be extendedPasco
T
3

Take a look at Regex Unicode blocks.

You can use this to take care of CJK names.

Taunyataupe answered 16/6, 2011 at 19:39 Comment(0)
M
2

As of 2018, there is new syntax in JavaScript to match Chinese or any other non-ASCII scripts:

const REGEX = /(\p{Script=Hani})+/gu; // note the 'u'
'你好'.match(REGEX);
// ["你好"]

The trick is to use \p and use the right script name, Hani stands for Han script (Chinese). The full list of scripts is here: http://unicode.org/Public/UNIDATA/PropertyValueAliases.txt

To match both Chinese and English you just expand it a bit, for example:

const REGEX = /([A-Za-z]|\p{Script=Hani})+/gu;
// does not match accented letters though
Misty answered 21/8, 2018 at 3:50 Comment(3)
It is Han, not HaniFig
Looks like both work, tried in Chrome. Sorry about saying it's wrong without verifying. The only difference is, "Hani" is the "code name", "Han" is the real name of the language. Like "Grek" vs "Greek". I'm Chinese, apparently my brain told me it should be "Han" not "Hani", they forced all the code names into 4 chars. Shrug.Fig
It works! By far, the simplest solution I found, thanks a lot!Galaxy
J
0

I have done some work on validating Chinese names using XRegExp. The core code is XRegExp("^((?![\\p{InKangxi_Radicals}\\p{InCJK_Radicals_Supplement}\\p{InCJK_Symbols_and_Punctuation}])\\p{Han}){2,4}$","u")

See jsfiddle.net/coas/4djhso1y

Joniejonina answered 14/3, 2018 at 16:58 Comment(0)
P
-1
var chkName = /\s/;

function check(name) {

    document.write("<br />" + name + " is ");

    if (!chkName.test(name)) {
        document.write("okay");
    } else {
        document.write("invalid");
    }

}

check("namevaluegoeshere");

check("name value goes here");

This way you just check if there's any white space in the name.

demo @ http://jsfiddle.net/roberkules/U3q5W/

Pasco answered 16/6, 2011 at 19:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.