What kind of normalization is used by Swift string comparisons?
Asked Answered
L

1

7

Elsewhere I've seen it told that Swift's comparisons use NFD normalization.

However, running in the iSwift playground I've found that

print("\u{0071}\u{0307}\u{0323}" == "\u{0071}\u{0323}\u{0307}");

gives false, despite this being an example straight from the standard of "Canonical Equivalence", which Swift's documentation claims to follow.

So, what kind of canonicalization is performed by Swift, and is this a bug?

Lancelancelet answered 31/1, 2016 at 15:44 Comment(6)
Good question! Also, as I understand the documentation, "\u{0071}\u{0307}\u{0323}".precomposedStringWithCanonicalMapping should return "\u{0071}\u{0323}\u{0307}", i.e. the NFC form with the combining marks in a defined order. But it doesn't, as one can verify with print(Array(string.unicodeScalars)).Endo
Does the source code give a clue? "The strings which are equivalent according to their NFD form are considered equal. ..." – As I understand it, your strings have the same NFC form, but different NFD form.Endo
@MartinR It's not NFD vs. NFC, since NFC is just NFD followed by "Canonical Composition", which happens after the reordering (which is deduced from "The fully decomposed and canonically ordered string is processed by another subpart of the Unicode Normalization Algorithm known as the Canonical Composition Algorithm.") I've checked the behaviour against Python's unicodedata.normalize, and Python seems to agree that NFD should reorder.Lancelancelet
I assume an answer can be found by digging deeper into the Swift source code... Ultimately, if I see it correctly, the ICU library is used for string comparisons. – You could also ask at lists.swift.org/mailman/listinfo/swift-users.Endo
@Lancelancelet You can use bugreport.apple.com with a free Apple ID; you don't have to be a member of a paid developer program. And anyone can report Swift issues at bugs.swift.org.Mcvey
@Mcvey bugreport.apple.com is the site that was giving me trouble. Thanks for the other link; I'm not sure how I missed that. I've submitted a bug report.Lancelancelet
H
5

It seems that this was in bug in Swift that has since been fixed. With Swift 3 and Xcode 8.0,

print("\u{0071}\u{0307}\u{0323}" == "\u{0071}\u{0323}\u{0307}")

now prints true.

Huysmans answered 19/9, 2016 at 13:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.