Check if a character is Russian
Asked Answered
S

2

6

I would like to know if a string contains Russian/Cyrillic characters.

For latin characters, I do something like this (pseudocode):

text := "test"
for _, r := range []rune(text) {
    if r >= 'a' && r <= 'z' {
        return True
    }
}
return False

What is the corresponding way to do it for Russian/Cyrillic alphabet?

Sanitize answered 27/6, 2017 at 20:30 Comment(1)
Did you try just using the Unicode charts (i assume your input is unicode), like this for example link? Just iterate over whatever values you are interested in.Unheardof
S
15

This seems to work

unicode.Is(unicode.Cyrillic, r) // r is a rune
Sanitize answered 27/6, 2017 at 20:54 Comment(1)
This is the way to go. It catches the full range of Cyrillic characters including oddballs like ᴫ U+01D2B, and the huge range at U+0A640 and U+00460.Cherilyncherilynn
U
1

I went on and did this example implementation for finding russian uppercase chars, based on this Unicode chart:

func isRussianUpper(text string) bool {
    for _, r := range []rune(text) {
        if r < '\u0410' || r > '\u042F' {
            return false
        }
    }
    return true
}

You can do any set of characters this way. Just modify the codes of characters you are interested in.

Unheardof answered 27/6, 2017 at 20:45 Comment(2)
thanks, some letters in the russian alphabet look like latin letters (like the o or A) so I thought I would have to do something more complicatedSanitize
Note that there are many Cyrillic characters outside of that range. There's more at U+00460 (that range alternates upper and lower), one at U+01D2B, and more at U+0A640. And more can be added. Avoid hard coding ranges, use unicode.Is(unicode.Cyrillic, r) instead. To distinguish between upper and lower case, use unicode.IsUpper and unicode.IsLower keeping in mind that there are some characters which are both and some which are neither.Cherilyncherilynn

© 2022 - 2024 — McMap. All rights reserved.