Detect Russian / cyrillic in Javascript string?
Asked Answered
E

2

17

I'm trying to detect if a string contains Russian (cyrillic) characters or not. I'm using this code:

term.match(/[\wа-я]+/ig);

but it doesn't work – or in fact it just returns the string back as it is.

Can somebody help with the right code?

Thanks!

Echidna answered 10/11, 2014 at 15:3 Comment(1)
You include \w in the regular expression, so it matches words with Latin characters as well.Azoic
C
22

Perhaps you meant to use the RegExp test method instead?

/[а-яА-ЯЁё]/.test(term)

Note that JavaScript regexes are not really Unicode-aware, which means the i flag will have no effect on anything that's not ASCII. Hence the need for spelling out lower- and upper-case ranges separately.

Camilla answered 10/11, 2014 at 15:6 Comment(5)
You might want to add Ёё since they are also used in Russian.Exorable
the cyrillic unicode range doens't work, but the other method works greatEchidna
This answers means you have to store your .js files as unicode. Hmm.Butte
@cymro, or use Unicode escape within the regex. But storing and transmitting text files as UTF-8 should really be the default nowadays. We're not in the 70s anymore.Camilla
Joey, thanks for your comment. Storing js files as UTF-8 often adds an unwanted BOM at the beginning.Butte
B
41

Use pattern /[\u0400-\u04FF]/ to cover more cyrillic characters:

// http://jrgraphix.net/r/Unicode/0400-04FF
const cyrillicPattern = /^[\u0400-\u04FF]+$/;

console.log('Привіт:', cyrillicPattern.test('Привіт'));
console.log('Hello:', cyrillicPattern.test('Hello'));

UPDATE:

In some new browsers, you can use Unicode property escapes.

The Cyrillic script uses the same range as described above: U+0400..U+04FF

const cyrillicPattern = /^\p{Script=Cyrillic}+$/u;

console.log('Привіт:', cyrillicPattern.test('Привіт'));
console.log('Hello:', cyrillicPattern.test('Hello'));
Baden answered 9/11, 2016 at 9:30 Comment(6)
Perfect answer! More character ranges can be found in this format here: kourge.net/projects/regexp-unicode-blockSperrylite
@Sperrylite Link is not available anymoreTwomey
@Twomey I cannot update my comment but here is the link from Archive.org: web.archive.org/web/20200118100606/http://kourge.net/projects/…Sperrylite
No spaces or punctuation are workingGyimah
@NairiAregHatspanyan for spaces and punctuation, extend the pattern with spaces and punctuation. Example: /^[\p{Script=Cyrillic}\s\.\!]+$/uBaden
@NairiAregHatspanyan and if you need just detect and not match, then: /\p{Script=Cyrillic}/u.test('hello привіт') // true /\p{Script=Cyrillic}/u.test('hello "№%:') // falseBaden
C
22

Perhaps you meant to use the RegExp test method instead?

/[а-яА-ЯЁё]/.test(term)

Note that JavaScript regexes are not really Unicode-aware, which means the i flag will have no effect on anything that's not ASCII. Hence the need for spelling out lower- and upper-case ranges separately.

Camilla answered 10/11, 2014 at 15:6 Comment(5)
You might want to add Ёё since they are also used in Russian.Exorable
the cyrillic unicode range doens't work, but the other method works greatEchidna
This answers means you have to store your .js files as unicode. Hmm.Butte
@cymro, or use Unicode escape within the regex. But storing and transmitting text files as UTF-8 should really be the default nowadays. We're not in the 70s anymore.Camilla
Joey, thanks for your comment. Storing js files as UTF-8 often adds an unwanted BOM at the beginning.Butte

© 2022 - 2024 — McMap. All rights reserved.