How to match Cyrillic characters with a regular expression
Asked Answered
D

11

91

How do I match French and Russian Cyrillic alphabet characters with a regular expression? I only want to do the alpha characters, no numbers or special characters. Right now I have

[A-Za-z]

Darvon answered 11/11, 2009 at 17:1 Comment(3)
Look in this question: Regex and unicodeTunnell
Here it is: [А-Яа-я]Huberty
@AlexErygin For Russian only characters it is: [ЁёА-я] (where А is Russian). The unicode code for Russian а is right after Я, so you don't need 2 ranges. The unicode codes for Ёё is not between А-я so you need to specify Ёё separately.Stefaniastefanie
I
47

It depends on your regex flavor. If it supports Unicode character classes (like .NET, for instance), \p{L} matches a letter character (in any character set).

Imogene answered 11/11, 2009 at 19:57 Comment(4)
How about doing this in Java?Witte
This will match any Cyrillic characters including those not present in the Russian alphabet (Greg was asking about Russian Cyrillic)Stefaniastefanie
In Javascript, you need to also add the flag 'u'. See javascript.info/regexp-unicode.Swearingen
Note: p{L} from JavaScript doesn't work in Safari at the moment.Heall
F
67

If your regex flavor supports Unicode blocks ([\p{IsCyrillic}]), you can match Cyrillic characters with:

[\p{IsCyrillic}] or [\p{Cyrillic}]

Otherwise try using:

[U+0400–U+04FF]

For PHP use:

[\x{0400}-\x{04FF}]

Explanation:

[\p{IsCyrillic}]

Match a character from the Unicode block "Cyrillic" (U+0400–U+04FF) «[\p{IsCyrillic}]»

Note:

Unicode Characters list and Numeric HTML Entities of [U+0400–U+04FF] .

Frederico answered 14/6, 2011 at 10:50 Comment(5)
This thread explains that #7927014Weiss
@black Which programming language are you using?Frederico
I am using PHP.Phenomena
For php try using [\x{0400}-\x{04FF}] instead. regex101.com/r/zcRenT/1Frederico
PHP supports \p{Cyrillic}, you just need to make sure to add a u flag onto the regexApply
I
47

It depends on your regex flavor. If it supports Unicode character classes (like .NET, for instance), \p{L} matches a letter character (in any character set).

Imogene answered 11/11, 2009 at 19:57 Comment(4)
How about doing this in Java?Witte
This will match any Cyrillic characters including those not present in the Russian alphabet (Greg was asking about Russian Cyrillic)Stefaniastefanie
In Javascript, you need to also add the flag 'u'. See javascript.info/regexp-unicode.Swearingen
Note: p{L} from JavaScript doesn't work in Safari at the moment.Heall
S
32

To match only Russian Cyrillic characters use:

[\u0401\u0451\u0410-\u044f]

which is the equivalent of:

[ЁёА-я]

where А is Cyrillic, not Latin. (Despite looking the same they have different codes)

\p{IsCyrillic}, \p{Cyrillic}, [\u0400-\u04FF] which others suggested will match all variants of Cyrillic, not only Russian

Stefaniastefanie answered 10/9, 2018 at 11:48 Comment(0)
P
11

If you use modern PHP version - just:

preg_match("/^[\p{L}]+$/u");

Don't forget the u flag for unicode support!

Pokey answered 29/7, 2014 at 13:31 Comment(2)
Can you explain your regex please? I tried it with Бори́с but it does not match, so your regex does not work.Phenomena
It's easy, please look at: php.net/manual/en/regexp.reference.unicode.php "L" means any letter. So the "и́" symbol should be in some other group! Try to find it.Mecke
T
7

Regex to match cyrillic alphabets with normal(english) alphabets :

^[A-Za-z.!@?#"$%&:;() *\+,\/;\-=[\\\]\^_{|}<>\u0400-\u04FF]*$

It matches special chars,cyrillic alphabets,english alphabets.

Tobitobiah answered 30/1, 2017 at 9:53 Comment(1)
Non-English alphabets are not normal ??? Not to mention there is only 1 English alphabetStefaniastefanie
M
5

Various regex dialects use [:alpha:] for any alphanumeric character in the current locale. (You may need to put that in a character class, e.g. [[:alpha:]].)

Monadelphous answered 11/11, 2009 at 17:22 Comment(1)
This works in PostgreSQL too, but matches all national characters (so not only current locale). And you can also use [[:lower:]] and [[:upper:]] for matching specific case. E.g. replace lower case characters: regexp_replace(firstname, '[[:lower:]]', 'a', 'g').Kitchenmaid
C
5

this worked for me

[a-z\u0400-\u04FF]
Cards answered 25/5, 2018 at 7:58 Comment(1)
to match ONLY Cyrillic characters use [\u0400-\u04FF]Pettifogger
T
2

If you use Elixir:

String.match?(string, ~r/^\p{Cyrillic}*$/u)

You need to add the u flag for unicode support.

Tryparsamide answered 12/1, 2019 at 12:48 Comment(1)
Attention, the above regex returns true for empty String: String.match?("", ~r/^\p{Cyrillic}*$/u) => true. You should change * modifier for + to fix that.Slipshod
P
2

You can use the first and the last letter. For example in Bulgarian:

[А-я]+
Pothunter answered 7/2, 2023 at 21:51 Comment(0)
B
0

For modern PHP (source):

$string = 'тест тест Тест Обязателльно Stackoverflow >!<';
var_dump(preg_replace('/[\x{0410}-\x{042F}]+.*[\x{0410}-\x{042F}]+/iu', '', $string));
Bobbery answered 8/5, 2022 at 19:11 Comment(0)
L
-2

In Java to match Cyrillic letters and space use the following pattern

^[\p{InCyrillic}\s]+$
Lake answered 7/8, 2019 at 10:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.