Detect Russian / cyrillic in Javascript string?

E

2

17

I'm trying to detect if a string contains Russian (cyrillic) characters or not. I'm using this code:

term.match(/[\wа-я]+/ig);

but it doesn't work – or in fact it just returns the string back as it is.

Can somebody help with the right code?

Thanks!

Echidna answered 10/11, 2014 at 15:3 Comment(1)

You include \w in the regular expression, so it matches words with Latin characters as well. – Azoic 10/11, 2014 at 15:6

C

22

Perhaps you meant to use the RegExp test method instead?

/[а-яА-ЯЁё]/.test(term)

Note that JavaScript regexes are not really Unicode-aware, which means the i flag will have no effect on anything that's not ASCII. Hence the need for spelling out lower- and upper-case ranges separately.

Camilla answered 10/11, 2014 at 15:6 Comment(5)

You might want to add Ёё since they are also used in Russian. – Exorable 10/11, 2014 at 15:35

the cyrillic unicode range doens't work, but the other method works great – Echidna 10/11, 2014 at 16:17

This answers means you have to store your .js files as unicode. Hmm. – Butte 23/4, 2019 at 19:1

@cymro, or use Unicode escape within the regex. But storing and transmitting text files as UTF-8 should really be the default nowadays. We're not in the 70s anymore. – Camilla 23/4, 2019 at 22:7

Joey, thanks for your comment. Storing js files as UTF-8 often adds an unwanted BOM at the beginning. – Butte 8/5, 2019 at 19:18

B

41

Use pattern /[\u0400-\u04FF]/ to cover more cyrillic characters:

// http://jrgraphix.net/r/Unicode/0400-04FF
const cyrillicPattern = /^[\u0400-\u04FF]+$/;

console.log('Привіт:', cyrillicPattern.test('Привіт'));
console.log('Hello:', cyrillicPattern.test('Hello'));

UPDATE:

In some new browsers, you can use Unicode property escapes.

The Cyrillic script uses the same range as described above: U+0400..U+04FF

const cyrillicPattern = /^\p{Script=Cyrillic}+$/u;

console.log('Привіт:', cyrillicPattern.test('Привіт'));
console.log('Hello:', cyrillicPattern.test('Hello'));

Baden answered 9/11, 2016 at 9:30 Comment(6)

Perfect answer! More character ranges can be found in this format here: kourge.net/projects/regexp-unicode-block – Sperrylite 11/3, 2019 at 19:2

@Sperrylite Link is not available anymore – Twomey 14/5, 2021 at 18:6

@Twomey I cannot update my comment but here is the link from Archive.org: web.archive.org/web/20200118100606/http://kourge.net/projects/… – Sperrylite 15/5, 2021 at 19:34

No spaces or punctuation are working – Gyimah 13/3, 2023 at 23:30

@NairiAregHatspanyan for spaces and punctuation, extend the pattern with spaces and punctuation. Example: /^[\p{Script=Cyrillic}\s\.\!]+$/u – Baden 15/8 at 15:11

@NairiAregHatspanyan and if you need just detect and not match, then: /\p{Script=Cyrillic}/u.test('hello привіт') // true /\p{Script=Cyrillic}/u.test('hello "№%:') // false – Baden 15/8 at 15:13

C

22