How to validate both Chinese (unicode) and English name?

Asked 16/6, 2011 at 19:25 Answered 21/8, 2018 at 3:50

Solved javascript regex unicode character-properties

I have a multilingual website (Chinese and English).

I like to validate a text field (name field) in javascript. I have the following code so far.

var chkName = /^[characters]{1,20}$/;

if( chkName.test("[name value goes here]") ){
  alert("validated");
}

the problem is, /^[characters]{1,20}$/ only matches English characters. Is it possible to match ANY (including unicode) characters? I used to use the following regex, but I don't want to allow spaces between each characeters.

/^(.+){1,20}$/

Breechloader answered 16/6, 2011 at 19:25 Comment(4)

What do you intend to do if a Korean, Japanese, Vietnamese, or Klingon name is provided? – Marlonmarlow 16/6, 2011 at 19:27

What rules do you have? 1-20 characters, no spaces. Anything else? – Pasco 16/6, 2011 at 19:29

@Russell Borogove // That is my concern as well. I want to validate all the unicodes and english. – Breechloader 16/6, 2011 at 19:31

@Pasco // for now, I want to allow only characters without spaces. – Breechloader 16/6, 2011 at 19:32

You might check out Javascript + Unicode regexes and do some research to find exactly which ranges of characters you want to allow:

See What's the complete range for Chinese characters in Unicode?

After reading those two and a little extra research you should be able to find appropriate values to complete something like: /^[-'a-z\u4e00-\u9eff]{1,20}$/i

Gio answered 16/6, 2011 at 19:31 Comment(1)

in case ie.: german äüöß, french é..., spanish ñ... should be supported, the regex would need to be extended – Pasco 5/2, 2019 at 11:31

Take a look at Regex Unicode blocks.

You can use this to take care of CJK names.

Taunyataupe answered 16/6, 2011 at 19:39 Comment(0)

As of 2018, there is new syntax in JavaScript to match Chinese or any other non-ASCII scripts:

const REGEX = /(\p{Script=Hani})+/gu; // note the 'u'
'你好'.match(REGEX);
// ["你好"]

The trick is to use \p and use the right script name, Hani stands for Han script (Chinese). The full list of scripts is here: http://unicode.org/Public/UNIDATA/PropertyValueAliases.txt

To match both Chinese and English you just expand it a bit, for example:

const REGEX = /([A-Za-z]|\p{Script=Hani})+/gu;
// does not match accented letters though

Misty answered 21/8, 2018 at 3:50 Comment(3)

It is Han, not Hani – Fig 4/2, 2019 at 3:28

Looks like both work, tried in Chrome. Sorry about saying it's wrong without verifying. The only difference is, "Hani" is the "code name", "Han" is the real name of the language. Like "Grek" vs "Greek". I'm Chinese, apparently my brain told me it should be "Han" not "Hani", they forced all the code names into 4 chars. Shrug. – Fig 13/2, 2019 at 7:26

It works! By far, the simplest solution I found, thanks a lot! – Galaxy 3/12, 2022 at 11:38

I have done some work on validating Chinese names using XRegExp. The core code is XRegExp("^((?![\\p{InKangxi_Radicals}\\p{InCJK_Radicals_Supplement}\\p{InCJK_Symbols_and_Punctuation}])\\p{Han}){2,4}$","u")

See jsfiddle.net/coas/4djhso1y

Joniejonina answered 14/3, 2018 at 16:58 Comment(0)

-1

var chkName = /\s/;

function check(name) {

    document.write("<br />" + name + " is ");

    if (!chkName.test(name)) {
        document.write("okay");
    } else {
        document.write("invalid");
    }

}

check("namevaluegoeshere");

check("name value goes here");

This way you just check if there's any white space in the name.

demo @ http://jsfiddle.net/roberkules/U3q5W/

Pasco answered 16/6, 2011 at 19:37 Comment(0)

Recommended topics

Hot tags