How can I check if a string contains Chinese in Swift?
Asked Answered
R

5

11

I want to know that how can I check if a string contains Chinese in Swift?

For example, I want to check if there's Chinese inside:

var myString = "Hi! 大家好!It's contains Chinese!"

Thanks!

Rede answered 6/7, 2015 at 11:11 Comment(0)
E
15

This answer to How to determine if a character is a Chinese character can also easily be translated from Ruby to Swift (now updated for Swift 3):

extension String {
    var containsChineseCharacters: Bool {
        return self.range(of: "\\p{Han}", options: .regularExpression) != nil
    }
}

if myString.containsChineseCharacters {
    print("Contains Chinese")
}

In a regular expression, "\p{Han}" matches all characters with the "Han" Unicode property, which – as I understand it – are the characters from the CJK languages.

Endurable answered 6/7, 2015 at 12:4 Comment(3)
and is there anyway I can get just the Chinese words? Thanks!Swinney
@Arefly: Unfortunately, I am not an expert for the Chinese language, I have "blindly" translated the Ruby code :) There are also "Katakana" and "Hiragana" properties, but I don't know if they are of any use.Endurable
@Arefly: For more fine-grained control, Airspeed Velocity's answer might be better suited because you can adjust the table with the Unicode ranges according to your needs, such as "only Chinese characters".Endurable
O
5

Looking at questions on how to do this in other languages (such as this accepted answer for Ruby) it looks like the common technique is to determine if each character in the string falls in the CJK range. The ruby answer could be adapted to Swift strings as extension with the following code:

extension String {
    var containsChineseCharacters: Bool {
        return self.unicodeScalars.contains { scalar in
            let cjkRanges: [ClosedInterval<UInt32>] = [
                0x4E00...0x9FFF,   // main block
                0x3400...0x4DBF,   // extended block A
                0x20000...0x2A6DF, // extended block B
                0x2A700...0x2B73F, // extended block C
            ]
            return cjkRanges.contains { $0.contains(scalar.value) }
        }
    }
}

// true:
"Hi! 大家好!It's contains Chinese!".containsChineseCharacters
// false:
"Hello, world!".containsChineseCharacters

The ranges may already exist in Foundation somewhere rather than manually hardcoding them.

The above is for Swift 2.0, for earlier, you will have to use the free contains function rather than the protocol extension (twice):

extension String {
    var containsChineseCharacters: Bool {
        return contains(self.unicodeScalars) {
          // older version of compiler seems to need extra help with type inference 
          (scalar: UnicodeScalar)->Bool in
            let cjkRanges: [ClosedInterval<UInt32>] = [
                0x4E00...0x9FFF,   // main block
                0x3400...0x4DBF,   // extended block A
                0x20000...0x2A6DF, // extended block B
                0x2A700...0x2B73F, // extended block C
            ]
            return contains(cjkRanges) { $0.contains(scalar.value) }
        }
    }
}
Operculum answered 6/7, 2015 at 11:41 Comment(2)
Error: String.UnicodeScalarView does not have a member named contains?Swinney
ah sorry, this is Swift 2.0, let me put in a version for 1.2Operculum
S
3

The accepted answer only find if string contains Chinese character, i created one suit for my own case:

enum ChineseRange {
    case notFound, contain, all
}

extension String {
    var findChineseCharacters: ChineseRange {
        guard let a = self.range(of: "\\p{Han}*\\p{Han}", options: .regularExpression) else {
            return .notFound
        }
        var result: ChineseRange
        switch a {
        case nil:
            result = .notFound
        case self.startIndex..<self.endIndex:
            result = .all
        default:
            result = .contain
        }
        return result
    }
}

if "你好".findChineseCharacters == .all {
    print("All Chinese")
}

if "Chinese".findChineseCharacters == .notFound {
    print("Not found Chinese")
}

if "Chinese你好".findChineseCharacters == .contain {
    print("Contains Chinese")
}

gist here: https://gist.github.com/williamhqs/6899691b5a26272550578601bee17f1a

Spavin answered 29/5, 2017 at 3:43 Comment(2)
Exactly what I was looking for. Cheers!Avocet
"你 好".findChineseCharacters should returns .allSpavin
S
2

Try this in Swift 2:

var myString = "Hi! 大家好!It's contains Chinese!"

var a = false

for c in myString.characters {
    let cs = String(c)
    a = a || (cs != cs.stringByApplyingTransform(NSStringTransformMandarinToLatin, reverse: false))
}
print("\(myString) contains Chinese characters = \(a)")
Sackville answered 6/7, 2015 at 11:54 Comment(0)
F
0

I have created a Swift 3 String extension for checking how much Chinese characters a String contains. Similar to the code by Airspeed Velocity but more comprehensive. Checking various Unicode ranges to see whether a character is Chinese. See Chinese character ranges listed in the tables under section 18.1 in the Unicode standard specification: http://www.unicode.org/versions/Unicode9.0.0/ch18.pdf

The String extension can be found on GitHub: https://github.com/niklasberglund/String-chinese.swift

Usage example:

let myString = "Hi! 大家好!It contains Chinese!"
let chinesePercentage = myString.chinesePercentage()
let chineseCharacterCount = myString.chineseCharactersCount()
print("String contains \(chinesePercentage) percent Chinese. That's \(chineseCharacterCount) characters.")
Flotow answered 8/1, 2017 at 13:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.