Matching a Unicode "name" with a JavaScript Regular Expression
Asked Answered
E

2

3

In JavaScript we can match individual Unicode codepoints or codepoint ranges by using the Unicode escape sequences, e.g.:

"A".match(/\u0041/) // => ["A"]
"B".match(/[\u0041-\u007A]/) // => ["B"]

But how could we create a regular expression to match a proper name which must include any Unicode "letter" using a JavaScript regular expression? Is there a range of letters? A special regex sequence or character class in JavaScript?

Say my website must validate names that could be in latin based languages as well as Hebrew, Cyrillic, Japanese (Katakana, Hiragana, etc.) is this feasible in JavaScript or is the only sane choice to delegate to a backend language with better Unicode support?

Epsomite answered 6/4, 2011 at 18:18 Comment(3)
You may also want to read #4323886 and #4718766Mercurochrome
And kalzumeus.com/2010/06/17/… and blog.jgc.org/2010/06/your-last-name-contains-invalid.htmlMercurochrome
I really think you should carefully consider your last choice: delegating the backend work to a language that actually supports The Unicode Standard.Cartierbresson
F
5

Here's a JS plugin that adds Unicode support to RegEx

http://xregexp.com/plugins/

Fijian answered 6/4, 2011 at 18:29 Comment(0)
G
0

I am using for defining unicode of a symbols this site http://www.fileformat.info.

Unicode Blocks (Basic Latin, .+, Cyrillic, .+, Arabic and other): http://www.fileformat.info/info/unicode/block/index.htm

Unicode Character Categories (this does not work in JS): http://www.fileformat.info/info/unicode/category/index.htm

Letters (A-я): http://www.fileformat.info/info/unicode/char/a.htm

Fonts (which chars are supported in each font): http://www.fileformat.info/info/unicode/font/index.htm

Index for all above http://www.fileformat.info/info/unicode/index.htm

Gorham answered 7/4, 2011 at 12:6 Comment(1)
You mustn’t use Unicode blocks as a proxy for Unicode scripts, which is what you really want. The Unicode Standard speaks to this matter specifically.Cartierbresson

© 2022 - 2024 — McMap. All rights reserved.