NSAttributedString and emojis: issue with positions and lengths
Asked Answered
H

1

23

I'm coloring some parts of a text coming from an API (think "@mention" as on Twitter) using NSAttributedString.

The API gives me the text and an array of entities representing the parts of the text that are mentions (or links, tags, etc) which should be colored.

But sometimes, the coloration is offset because of emojis.


For example, with this text:

"@ericd Some text. @apero"

the API gives:

[ { "text" : "ericd", "len" : 6, "pos" : 0 }, { "text" : "apero", "len" : 6, "pos" : 18 } ]

which I successfully translate to an NSAttributedString using NSRange:

for m in entities.mentions {
    let r = NSMakeRange(m.pos, m.len)
    myAttributedString.addAttribute(NSForegroundColorAttributeName, value: someValue, range: r)
}

We see that "pos": 18 is correct, this is where "@apero" starts. The colored parts are "@ericd" and "@apero", as expected.

but when some specific combinations of emojis are used in the text, the API does not translate well to NSATtributedString, the coloration is offset:

"@ericd Some text. 😺✌🏻 @apero"

gives:

[ { "text" : "ericd", "len" : 6, "pos" : 0 }, { "text" : "apero", "len" : 6, "pos" : 22 } ]

"pos": 22: the API author states that this is correct, and I understand their point of view.

Unfortunately, NSAttributedString does not agree, my coloration is off:

enter image description here

The last characters for the second mention are not colored (because the "pos" is too short because of the emojis?).

As you might have already guessed, I cannot in any way change the way the API behaves, I have to adapt on client side.

Except that... I have no idea what to do. Should I try to detect what kind of emojis are in the text and manually amend the position of mentions when there's a problematic emoji? But what would be the criteria to detect which emoji shifts the position and which doesn't? And how to decide how much offset I need? Maybe the problem is caused by NSAttributedString?

I understand that this is related to the emojis length once composed compared to their length as discrete characters, but... well... I'm lost (sigh).


Note that I've tried to implement a solution similar to this stuff because my API is compatible with this one, but it only worked partially, some emojis were still breaking the indexes:

enter image description here

Heall answered 8/2, 2017 at 13:47 Comment(0)
Q
29

A Swift String provides different "views" on its contents. A good overview is given in "Strings in Swift 2" in the Swift Blog:

  • characters is a collection of Character values, or extended grapheme clusters.
  • unicodeScalars is a collection of Unicode scalar values.
  • utf8 is a collection of UTF–8 code units.
  • utf16 is a collection of UTF–16 code units.

As it turned out in the discussion, pos and len from your API are indices into the Unicode scalars view.

On the other hand, the addAttribute() method of NSMutableAttributedString takes an NSRange, i.e. the range corresponding to indices of the UTF-16 code points in an NSString.

String provides methods to "translate" between indices of the different views (compare NSRange to Range<String.Index>):

let text = "@ericd Some text. 😺✌🏻 @apero"
let pos = 22
let len = 6

// Compute String.UnicodeScalarView indices for first and last position:
let from32 = text.unicodeScalars.index(text.unicodeScalars.startIndex, offsetBy: pos)
let to32 = text.unicodeScalars.index(from32, offsetBy: len)

// Convert to String.UTF16View indices:
let from16 = from32.samePosition(in: text.utf16)
let to16 = to32.samePosition(in: text.utf16)

// Convert to NSRange by computing the integer distances:
let nsRange = NSRange(location: text.utf16.distance(from: text.utf16.startIndex, to: from16),
                      length: text.utf16.distance(from: from16, to: to16))

This NSRange is what you need for the attributed string:

let attrString = NSMutableAttributedString(string: text)
attrString.addAttribute(NSForegroundColorAttributeName,
                        value: UIColor.red,
                        range: nsRange)

Update for Swift 4 (Xcode 9): In Swift 4, the standard library provides methods to convert between Swift String ranges and NSString ranges, therefore the calculations simplify to

let text = "@ericd Some text. 😺✌🏻 @apero"
let pos = 22
let len = 6

// Compute String.UnicodeScalarView indices for first and last position:
let fromIdx = text.unicodeScalars.index(text.unicodeScalars.startIndex, offsetBy: pos)
let toIdx = text.unicodeScalars.index(fromIdx, offsetBy: len)

// Compute corresponding NSRange:
let nsRange = NSRange(fromIdx..<toIdx, in: text)
Questa answered 8/2, 2017 at 14:20 Comment(6)
@EricAya: But the "@" from "@apero" is at position 21, not 22. Or how do you count the characters? – Questa
@EricAya: Then I misunderstood the problem, and pos, len count grapheme clusters, not composed character sequences. Try let from = text.index(text.startIndex, offsetBy: pos) and let to = text.index(text.startIndex, offsetBy: pos+len) instead of the text.enumerateSubstrings stuff. – Questa
Let us continue this discussion in chat. – Questa
@Moritz: I have updated the answer for Swift 4, thanks for the notice! – Questa
Hi, I Need a reverse above function can you help me ? Thanks – Towers
@m.alqadi: See https://mcmap.net/q/23816/-nsrange-to-range-lt-string-index-gt for conversions between String/NSString ranges in both directions. – Questa

© 2022 - 2024 β€” McMap. All rights reserved.