How to match Cyrillic characters with a regular expression

D

11

91

How do I match French and Russian Cyrillic alphabet characters with a regular expression? I only want to do the alpha characters, no numbers or special characters. Right now I have

[A-Za-z]

Darvon answered 11/11, 2009 at 17:1 Comment(3)

Look in this question: Regex and unicode – Tunnell 11/11, 2009 at 17:3

Here it is: [А-Яа-я] – Huberty 30/6, 2018 at 17:44

@AlexErygin For Russian only characters it is: [ЁёА-я] (where А is Russian). The unicode code for Russian а is right after Я, so you don't need 2 ranges. The unicode codes for Ёё is not between А-я so you need to specify Ёё separately. – Stefaniastefanie 11/9, 2018 at 10:58

I

47

It depends on your regex flavor. If it supports Unicode character classes (like .NET, for instance), \p{L} matches a letter character (in any character set).

Imogene answered 11/11, 2009 at 19:57 Comment(4)

How about doing this in Java? – Witte 16/12, 2015 at 19:18

This will match any Cyrillic characters including those not present in the Russian alphabet (Greg was asking about Russian Cyrillic) – Stefaniastefanie 10/9, 2018 at 11:6

In Javascript, you need to also add the flag 'u'. See javascript.info/regexp-unicode. – Swearingen 30/4, 2020 at 10:15

Note: p{L} from JavaScript doesn't work in Safari at the moment. – Heall 8/3, 2021 at 7:30

F

67

If your regex flavor supports Unicode blocks ([\p{IsCyrillic}]), you can match Cyrillic characters with:

[\p{IsCyrillic}] or [\p{Cyrillic}]

Otherwise try using:

[U+0400–U+04FF]

For PHP use:

[\x{0400}-\x{04FF}]

Explanation:

[\p{IsCyrillic}]

Match a character from the Unicode block "Cyrillic" (U+0400–U+04FF) «[\p{IsCyrillic}]»

Note:

Unicode Characters list and Numeric HTML Entities of [U+0400–U+04FF] .

Frederico answered 14/6, 2011 at 10:50 Comment(5)

This thread explains that #7927014 – Weiss 16/1, 2013 at 10:12

@black Which programming language are you using? – Frederico 2/9, 2019 at 23:0

I am using PHP. – Phenomena 3/9, 2019 at 6:15

For php try using [\x{0400}-\x{04FF}] instead. regex101.com/r/zcRenT/1 – Frederico 3/9, 2019 at 6:54

PHP supports \p{Cyrillic}, you just need to make sure to add a u flag onto the regex – Apply 1/9, 2023 at 5:54

I

47

It depends on your regex flavor. If it supports Unicode character classes (like .NET, for instance), \p{L} matches a letter character (in any character set).

Imogene answered 11/11, 2009 at 19:57 Comment(4)

How about doing this in Java? – Witte 16/12, 2015 at 19:18

This will match any Cyrillic characters including those not present in the Russian alphabet (Greg was asking about Russian Cyrillic) – Stefaniastefanie 10/9, 2018 at 11:6

In Javascript, you need to also add the flag 'u'. See javascript.info/regexp-unicode. – Swearingen 30/4, 2020 at 10:15

Note: p{L} from JavaScript doesn't work in Safari at the moment. – Heall 8/3, 2021 at 7:30

S

32

To match only Russian Cyrillic characters use:

[\u0401\u0451\u0410-\u044f]

which is the equivalent of:

[ЁёА-я]

where А is Cyrillic, not Latin. (Despite looking the same they have different codes)

\p{IsCyrillic}, \p{Cyrillic}, [\u0400-\u04FF] which others suggested will match all variants of Cyrillic, not only Russian

Stefaniastefanie answered 10/9, 2018 at 11:48 Comment(0)

P

11

If you use modern PHP version - just:

preg_match("/^[\p{L}]+$/u");

Don't forget the u flag for unicode support!

Pokey answered 29/7, 2014 at 13:31 Comment(2)

Can you explain your regex please? I tried it with Бори́с but it does not match, so your regex does not work. – Phenomena 2/9, 2019 at 8:19

It's easy, please look at: php.net/manual/en/regexp.reference.unicode.php "L" means any letter. So the "и́" symbol should be in some other group! Try to find it. – Mecke 3/9, 2019 at 14:16

T

7

Regex to match cyrillic alphabets with normal(english) alphabets :

^[A-Za-z.!@?#"$%&:;() *\+,\/;\-=[\\\]\^_{|}<>\u0400-\u04FF]*$

It matches special chars,cyrillic alphabets,english alphabets.

Tobitobiah answered 30/1, 2017 at 9:53 Comment(1)

Non-English alphabets are not normal ??? Not to mention there is only 1 English alphabet – Stefaniastefanie 1/6, 2021 at 11:10

M

5

Various regex dialects use [:alpha:] for any alphanumeric character in the current locale. (You may need to put that in a character class, e.g. [[:alpha:]].)

Monadelphous answered 11/11, 2009 at 17:22 Comment(1)

This works in PostgreSQL too, but matches all national characters (so not only current locale). And you can also use [[:lower:]] and [[:upper:]] for matching specific case. E.g. replace lower case characters: regexp_replace(firstname, '[[:lower:]]', 'a', 'g'). – Kitchenmaid 10/3, 2021 at 15:2

C

5

this worked for me

[a-z\u0400-\u04FF]

Cards answered 25/5, 2018 at 7:58 Comment(1)

to match ONLY Cyrillic characters use [\u0400-\u04FF] – Pettifogger 30/5, 2018 at 11:14

T

2

If you use Elixir:

String.match?(string, ~r/^\p{Cyrillic}*$/u)

You need to add the u flag for unicode support.

Tryparsamide answered 12/1, 2019 at 12:48 Comment(1)

Attention, the above regex returns true for empty String: String.match?("", ~r/^\p{Cyrillic}*$/u) => true. You should change * modifier for + to fix that. – Slipshod 28/2, 2019 at 15:17

P

2

You can use the first and the last letter. For example in Bulgarian:

[А-я]+

Pothunter answered 7/2, 2023 at 21:51 Comment(0)

B

0

For modern PHP (source):

$string = 'тест тест Тест Обязателльно Stackoverflow >!<';
var_dump(preg_replace('/[\x{0410}-\x{042F}]+.*[\x{0410}-\x{042F}]+/iu', '', $string));

Bobbery answered 8/5, 2022 at 19:11 Comment(0)

L

-2

In Java to match Cyrillic letters and space use the following pattern

^[\p{InCyrillic}\s]+$

Lake answered 7/8, 2019 at 10:0 Comment(0)

Recommended topics

Hot tags