Swift - Replacing emojis in a string with whitespace
Asked Answered
B

8

9

I have a method that detects urls in a string and returns me both the urls and the ranges where they can be found. Everything works perfectly until there are emojis on the string. For example:

"I'm gonna do this callenge as soon as I can swing again πŸ˜‚πŸ˜‚πŸ˜‚\n http://youtu.be/SW_d3fGz1hk"

Because of the emojis, the url extracted from the text is http://youtu.be/SW_d3fGz1 instead of http://youtu.be/SW_d3fGz1hk. I figured that the easiest solution was to just replace the emojis on the string with whitespace characters (cause I need the range to be correct for some text styling stuff). Problem is, this is extremely hard to accomplish with Swift (most likely my abilities with the Swift String API is lacking).

I've been trying to do it like this but it seems that I cannot create a string from an array of unicode points:

var emojilessStringWithSubstitution: String {
    let emojiRanges = [0x1F601...0x1F64F, 0x2702...0x27B0]
    let emojiSet = Set(emojiRanges.flatten())
    let codePoints: [UnicodeScalar] = self.unicodeScalars.map {
        if emojiSet.contains(Int($0.value)) {
            return UnicodeScalar(32)
        }
        return $0
    }
    return String(codePoints)
}

Am I approaching this problem the wrong way? Is replacing emojis the best solution here? If so, how can I do it?

Bichromate answered 28/4, 2016 at 15:27 Comment(0)
J
9

You can use pattern matching (for emoji patterns) to filter out emoji characters from your String.

extension String {

    var emojilessStringWithSubstitution: String {
        let emojiPatterns = [UnicodeScalar(0x1F601)...UnicodeScalar(0x1F64F),
                             UnicodeScalar(0x2702)...UnicodeScalar(0x27B0)]
        return self.unicodeScalars
            .filter { ucScalar in !(emojiPatterns.contains{ $0 ~= ucScalar }) }
            .reduce("") { $0 + String($1) }
    }  
}

/* example usage */
let str = "I'm gonna do this callenge as soon as I can swing again πŸ˜‚πŸ˜‚πŸ˜‚\n http://youtu.be/SW_d3fGz1hk"
print(str.emojilessStringWithSubstitution)

/* I'm gonna do this callenge as soon as I can swing again
   http://youtu.be/SW_d3fGz1hk */

Note that the above only makes use of the emoji intervals as presented in your question, and is in no way representative for all emojis, but the method is general and can swiftly be extended by including additional emoji intervals to the emojiPatterns array.


I realize reading your question again that you'd prefer substituting emojis with whitespace characters, rather than removing them (which the above filtering solution does). We can achieve this by replacing the .filter operation above with a conditional return .map operation instead, much like in your question

extension String {

    var emojilessStringWithSubstitution: String {
        let emojiPatterns = [UnicodeScalar(0x1F600)...UnicodeScalar(0x1F64F),
                         UnicodeScalar(0x1F300)...UnicodeScalar(0x1F5FF),
                         UnicodeScalar(0x1F680)...UnicodeScalar(0x1F6FF),
                         UnicodeScalar(0x2600)...UnicodeScalar(0x26FF),
                         UnicodeScalar(0x2700)...UnicodeScalar(0x27BF),
                         UnicodeScalar(0xFE00)...UnicodeScalar(0xFE0F)]

        return self.unicodeScalars
            .map { ucScalar in
                emojiPatterns.contains{ $0 ~= ucScalar } ? UnicodeScalar(32) : ucScalar }
            .reduce("") { $0 + String($1) }
    }
}

I the above, the existing emoji intervals has been extended, as per your comment to this post (listing these intervals), such that the emoji check is now possibly exhaustive.

Junie answered 28/4, 2016 at 16:0 Comment(5)
Almost! The string "Svsbsvs shsbsv πŸ˜’πŸ˜²πŸ‘ΉπŸ‘ΉπŸ˜’πŸ˜’πŸ‘Ή\nhttp://m.youtube.com/watch?v=H6_zvQQ5P8w" results in: "Svsbsvs shsbsv πŸ‘ΉπŸ‘Ή πŸ‘Ή\nhttp://m.youtube.com/watch?v=H6_zvQQ5P8w" – Bichromate
Figured it out. I was missing some ranges: et emojiPatterns = [UnicodeScalar(0x1F600)...UnicodeScalar(0x1F64F), UnicodeScalar(0x1F300)...UnicodeScalar(0x1F5FF), UnicodeScalar(0x1F680)...UnicodeScalar(0x1F6FF), UnicodeScalar(0x2600)...UnicodeScalar(0x26FF), UnicodeScalar(0x2700)...UnicodeScalar(0x27BF), UnicodeScalar(0xFE00)...UnicodeScalar(0xFE0F) – Bichromate
Ah yes, I should've included in my answer that I simply applied the emoji ranges from your question, and explainef the technique for using these in my answer, however in no way knowing if the ranges were exhaustive! Thanks for your edit! – Junie
@dfri Swift 4.1 let emojiPatterns = [UnicodeScalar(0x1F601)!...UnicodeScalar(0x1F64F)!, UnicodeScalar(0x2702)!...UnicodeScalar(0x27B0)!] – Krystenkrystin
@LeoDabus thanks! I tried a quick edit to the solution above but ran into "expression too complex" for the emojiPatterns in the second block (6 patterns). Seems silly and almost like a regression, but will try to look into re-factoring into something that the compiler can handle if I have some time over. – Junie
W
20

Swift 5

Don't use this hardcoded way to detect emojis. In Swift 5 you can do it easily

let inputText = "Some πŸ–string πŸ˜‚πŸ˜‚πŸ˜‚ with πŸ‘ΉπŸ‘Ή πŸ‘Ή emoji πŸ–"

let textWithoutEmoij = inputText.unicodeScalars
    .filter { !$0.properties.isEmojiPresentation }
    .reduce("") { $0 + String($1) }

print(textWithoutEmoij) // Some string  with   emoji 
Wanhsien answered 14/8, 2020 at 15:49 Comment(3)
This won't be useful if your app supported iOS 10.0 – Appreciative
What are the apps that support 10.0 in ios 14.0 era?? !! – Wanhsien
I don't think we need the reduce(). It seems to work if I just pass the results of filter() into a String constructor. – Trimetallic
J
9

You can use pattern matching (for emoji patterns) to filter out emoji characters from your String.

extension String {

    var emojilessStringWithSubstitution: String {
        let emojiPatterns = [UnicodeScalar(0x1F601)...UnicodeScalar(0x1F64F),
                             UnicodeScalar(0x2702)...UnicodeScalar(0x27B0)]
        return self.unicodeScalars
            .filter { ucScalar in !(emojiPatterns.contains{ $0 ~= ucScalar }) }
            .reduce("") { $0 + String($1) }
    }  
}

/* example usage */
let str = "I'm gonna do this callenge as soon as I can swing again πŸ˜‚πŸ˜‚πŸ˜‚\n http://youtu.be/SW_d3fGz1hk"
print(str.emojilessStringWithSubstitution)

/* I'm gonna do this callenge as soon as I can swing again
   http://youtu.be/SW_d3fGz1hk */

Note that the above only makes use of the emoji intervals as presented in your question, and is in no way representative for all emojis, but the method is general and can swiftly be extended by including additional emoji intervals to the emojiPatterns array.


I realize reading your question again that you'd prefer substituting emojis with whitespace characters, rather than removing them (which the above filtering solution does). We can achieve this by replacing the .filter operation above with a conditional return .map operation instead, much like in your question

extension String {

    var emojilessStringWithSubstitution: String {
        let emojiPatterns = [UnicodeScalar(0x1F600)...UnicodeScalar(0x1F64F),
                         UnicodeScalar(0x1F300)...UnicodeScalar(0x1F5FF),
                         UnicodeScalar(0x1F680)...UnicodeScalar(0x1F6FF),
                         UnicodeScalar(0x2600)...UnicodeScalar(0x26FF),
                         UnicodeScalar(0x2700)...UnicodeScalar(0x27BF),
                         UnicodeScalar(0xFE00)...UnicodeScalar(0xFE0F)]

        return self.unicodeScalars
            .map { ucScalar in
                emojiPatterns.contains{ $0 ~= ucScalar } ? UnicodeScalar(32) : ucScalar }
            .reduce("") { $0 + String($1) }
    }
}

I the above, the existing emoji intervals has been extended, as per your comment to this post (listing these intervals), such that the emoji check is now possibly exhaustive.

Junie answered 28/4, 2016 at 16:0 Comment(5)
Almost! The string "Svsbsvs shsbsv πŸ˜’πŸ˜²πŸ‘ΉπŸ‘ΉπŸ˜’πŸ˜’πŸ‘Ή\nhttp://m.youtube.com/watch?v=H6_zvQQ5P8w" results in: "Svsbsvs shsbsv πŸ‘ΉπŸ‘Ή πŸ‘Ή\nhttp://m.youtube.com/watch?v=H6_zvQQ5P8w" – Bichromate
Figured it out. I was missing some ranges: et emojiPatterns = [UnicodeScalar(0x1F600)...UnicodeScalar(0x1F64F), UnicodeScalar(0x1F300)...UnicodeScalar(0x1F5FF), UnicodeScalar(0x1F680)...UnicodeScalar(0x1F6FF), UnicodeScalar(0x2600)...UnicodeScalar(0x26FF), UnicodeScalar(0x2700)...UnicodeScalar(0x27BF), UnicodeScalar(0xFE00)...UnicodeScalar(0xFE0F) – Bichromate
Ah yes, I should've included in my answer that I simply applied the emoji ranges from your question, and explainef the technique for using these in my answer, however in no way knowing if the ranges were exhaustive! Thanks for your edit! – Junie
@dfri Swift 4.1 let emojiPatterns = [UnicodeScalar(0x1F601)!...UnicodeScalar(0x1F64F)!, UnicodeScalar(0x2702)!...UnicodeScalar(0x27B0)!] – Krystenkrystin
@LeoDabus thanks! I tried a quick edit to the solution above but ran into "expression too complex" for the emojiPatterns in the second block (6 patterns). Seems silly and almost like a regression, but will try to look into re-factoring into something that the compiler can handle if I have some time over. – Junie
Y
7

Swift 4:

extension String {
  func stringByRemovingEmoji() -> String {
    return String(self.filter { !$0.isEmoji() })
  }
}

extension Character {
  fileprivate func isEmoji() -> Bool {
    return Character(UnicodeScalar(UInt32(0x1d000))!) <= self && self <= Character(UnicodeScalar(UInt32(0x1f77f))!)
      || Character(UnicodeScalar(UInt32(0x2100))!) <= self && self <= Character(UnicodeScalar(UInt32(0x26ff))!)
  }
}
Yemen answered 17/4, 2018 at 9:33 Comment(0)
B
7

Emojis are classified as symbols by Unicode. Character sets are typically used in searching operations. So we will use Character sets a property that is symbols.

var emojiString =  "Hey there πŸ–, welcome"
emojiString = emojiString.components(separatedBy: CharacterSet.symbols).joined()       
print(emojiString)

Output is

Hey there , welcome

Now observe the emoji is replaced by a white space so there is two white space and we replace it by the following way

emojiString.replacingOccurrences(of: "  ", with: " ") 

The above method replace parameter of: "two white space" to with: "single white space"

Bondon answered 21/9, 2018 at 6:38 Comment(0)
S
3

Getting all emoji is more complicated than you would think. For more info on how to figure out which characters are emoji, check out this stackoverflow post or this article.

Building on that information, I would propose to use the extension on Character to more easily let us understand which characters are emoji. Then add a String extension to easily replace found emoji with another character.

extension Character {
   var isSimpleEmoji: Bool {
      guard let firstProperties = unicodeScalars.first?.properties else {
        return false
      }
      return unicodeScalars.count == 1 &&
          (firstProperties.isEmojiPresentation ||
             firstProperties.generalCategory == .otherSymbol)
   }
   var isCombinedIntoEmoji: Bool {
      return unicodeScalars.count > 1 &&
             unicodeScalars.contains {
                $0.properties.isJoinControl ||
                $0.properties.isVariationSelector
             }
   }
   var isEmoji: Bool {
      return isSimpleEmoji || isCombinedIntoEmoji
   }
}

extension String {
    func replaceEmoji(with character: Character) -> String {
        return String(map { $0.isEmoji ? character : $0 })
    }
}

Using it would simply become:

"Some string πŸ˜‚πŸ˜‚πŸ˜‚ with emoji".replaceEmoji(with: " ")
Slave answered 20/11, 2019 at 7:34 Comment(1)
Doesn't work with flags it appears πŸ‡ΊπŸ‡¦ – Abutilon
L
0

I found that the solutions given above did not work for certain characters such as πŸ‹οΈπŸ»β€β™‚οΈ and 🧰.

To find the emoji ranges, using regex I converted the full list of emoji characters to a file with just hex values. Then I converted them to decimal format and sorted them. Finally, I wrote a script to find the ranges.

Here is the final Swift extension for isEmoji().

extension Character {

    func isEmoji() -> Bool {
        let emojiRanges = [
            (8205, 11093),
            (12336, 12953),
            (65039, 65039),
            (126980, 129685)
        ]
        let codePoint = self.unicodeScalars[self.unicodeScalars.startIndex].value
        for emojiRange in emojiRanges {
            if codePoint >= emojiRange.0 && codePoint <= emojiRange.1 {
                return true
            }
        }
        return false
    }

}

For reference, here are the python scripts I wrote to parse the hex strings to integers and then find the ranges.

convert-hex-to-decimal.py

decimals = []
with open('hex.txt') as hexfile:
    for line in hexfile:
        num = int(line, 16)
        if num < 256:
            continue
        decimals.append(num)

decimals = list(set(decimals))
decimals.sort()

with open('decimal.txt', 'w') as decimalfile:
    for decimal in decimals:
        decimalfile.write(str(decimal) + "\n")

make-ranges.py

first_line = True
range_start = 0
prev = 0
with open('decimal.txt') as hexfile:
    for line in hexfile:
        if first_line: 
            prev = int(line)
            range_start = prev
            first_line = False
            continue

        curr = int(line)
        if prev + 1000 < curr: # 100 is abitrary to reduce number of ranges
            print("(" + str(range_start) + ", " + str(prev) + ")")
            range_start = curr
        prev = curr
Liaotung answered 17/5, 2019 at 7:13 Comment(1)
Looks promising but a few false positives ’ , β€” – Abutilon
G
0

This Swift 5 snippet works like a charm for me, but I haven't checked agains all emoji/symbols. However it's still displaying chinese, korean etc which my previous solution struggled with.

extension String {
    // works with unicode languages like korean
    func removeEmojis() -> String {
        return .init(
            unicodeScalars.filter {
                $0.properties.isEmojiPresentation == false &&
                $0.properties.isEmoji == false
            }
        )
    }
}
Gustafson answered 25/4, 2023 at 9:47 Comment(0)
S
-2

Don't hard-code the range of emojis, use this instead.

func εŽ»ι™€θ‘¨ζƒ…η¬¦ε·(字符串:String) -> String {
    let 转捒为Unicode = 字符串.unicodeScalars//https://developer.apple.com/documentation/swift/string
    
    let εŽ»ι™€θ‘¨ζƒ…εŽηš„η»“ζžœ = 转捒为Unicode.filter { (item) -> Bool in
        let εˆ€ζ–­ζ˜―ε¦θ‘¨ζƒ… = item.properties.isEmoji
         return !εˆ€ζ–­ζ˜―ε¦θ‘¨ζƒ…//ζ˜―θ‘¨ζƒ…ε°±δΈδΏη•™
      }
    
    return String(εŽ»ι™€θ‘¨ζƒ…εŽηš„η»“ζžœ)
}
Slurp answered 4/2, 2022 at 3:32 Comment(0)

© 2022 - 2024 β€” McMap. All rights reserved.