Which is the better Unicode Normalization Form?
Asked Answered
S

1

13

I have four options on Dreamweaver: C, D, KC, KD. Which one should I choose and why?

Shrapnel answered 22/3, 2011 at 10:43 Comment(0)
G
29

For what? Saving a file, use NFC as the web character model uses it (strictly, the W3C normalisation insists that both the stream be in NFC and also that when entities in HTML or XML are converted to the characters they represent, that it is still in NFC). The odds that it'll ever make a practical difference are slim, though it could stop a few rather obscure issues upsetting someone down the line.

Normalisation makes certain equivalent sequences result in identical streams. For example, U+0065 (e) followed by U+0301 (a combining acute accent) is equivalent to U+00E9 (é) on its own.

NFD splits all such strings up into their component parts (e.g. turning U+00E9 into U+0065 followed by U+0301). If there are two or more combining characters in a row, they are re-ordered according to rules that give a consistency (ḉ could have the cedilla followed by the accute or the accute followed by the cedilla, and we need a consistent ordering to have the same string produced). Mostly NFD is useful for internal processing as part of another task, such as stripping accents, or producing NFC.

NFC starts with NFD and then combines the characters together again where possible, barring a few exceptions to ensure that what was a normalised string with one version of Unicode remains so with another.

NFKD goes further than NFD in replacing certain similar characters with each other. ⁵ for example is replaced with 5. This "damages" the text (a user may reasonably choose ⁵ over 5 for a good reason) but is useful for searching (search for "fiſh" on google and it returns results for "fish" because it treats the long-s the same as a short-s) and as a restriction in certain cases to avoid security issues with similar but different characters. NFKC first does NFKD and then combines in the same manner as NFC.

http://unicode.org/reports/tr15/ for the full skinny, and "use NFC but don't worry about it" to repeat the short answer.

Globular answered 22/3, 2011 at 11:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.