Text run is not in Unicode Normalization Form C
Asked Answered
C

2

18

While I was trying to validate my site I get the following error:

Text run is not in Unicode Normalization Form C

A: What does it mean?

B: Can I fix it with notepad++ and how?

C: If B is no, How can I fix this with free tools(not dreamweaver)?

Caprice answered 28/3, 2011 at 21:15 Comment(2)
The error message has now been turned to a warning, because HTML specifications and drafts do not require that NFC be used – it’s just something that W3C generally favors. See discussion in the validator mailing list.Purnell
The address mentioned in the question does not work any more (it gets ridirected to a domain hosting site).Purnell
P
15

A. It means what it says (see dan04’s explanation for a brief answer and the Unicode Standard for a long one), but it simply indicates that the authors of the validator wanted to issue the warning. HTML5 rules do not require Normalization Form C (NFC); it is rather something generally favored by the W3C.

B.There is no need to fix anything, unless you decide that using NFC would actually be better. If you do, then there are various tools for automatic conversion to NFC, such as the free BabelPad editor. If you only need to deal with one character not in NFC, you can use character information repositories such as Fileformat.info character search to find out the canonical decomposition of the character and use it.

Whether you use NFC or not depends on many considerations and on the characters involved. As a rule, NFC works better, but in some cases, an alternative, non-NFC presentation produces more suitable rendering or works better in some specific processing.

For example, in a duplicate question, the reference Ω has been reported as triggering the message. (The validator actually checks for characters entered as such references, too, instead of just plain text level NFC check.) The reference stands for U+2126 OHM SIGN “Ω”, which is defined to be canonical equivalent to U+03A9 GREEK CAPITAL LETTER OMEGA “Ω”. The Unicode Standard explicitly says that the latter is the preferred character. It is also better covered in fonts. But if you have a special reason to use OHM SIGN, you can do that, without violating current HTML5 rules, and you can ignore the validator warning.

Purnell answered 14/4, 2013 at 18:34 Comment(0)
O
23

What does it mean?

From W3C:

In Unicode it is possible to produce the same text with different sequences of characters. For example, take the Hungarian word világ. The fourth letter could be stored in memory as a precomposed U+00E1 LATIN SMALL LETTER A WITH ACUTE (a single character) or as a decomposed sequence of U+0061 LATIN SMALL LETTER A followed by U+0301 COMBINING ACUTE ACCENT (two characters).

világ = világ

The Unicode Standard allows either of these alternatives, but requires that both be treated as identical. To improve efficiency, an application will usually normalize text before performing searches or comparisons. Normalization, in this case, means converting the text to use all precomposed or all decomposed characters.

There are four normalization forms specified by the Unicode Standard: NFC, NFD, NFKC and NFKD. The C stands for (pre-)composed, and the D for decomposed. The K stands for compatibility. To improve interoperability, the W3C recommends the use of NFC normalized text on the Web.

Besides "to improve interoperability", precomposed text usually looks better than decomposes text.

How can I fix this with free tools

By using the function equivalent to Python's text = unicodedata.normalize('NFC', text) in your favorite programming language.

(Or, if you weren't planning to write a program, your question should be moved to superuser or webmasters.)

Okay answered 29/3, 2011 at 0:31 Comment(2)
Normalization is more than coupling certain characters together. It’s also about ordering them within their combining class. For example, these 10 versions of hack each have subtly different orderings of marks: ĥ̲̗̖a̲ᷜ̃̂ç̲̌︣̕k̲̈͆, ĥ̲̗̖a̲ᷜ̃̂ç̲︣̌̕k̲͆̈, ĥ̲̖̗ẫ̲ᷜç̲︣̌̕k̲̈͆, ĥ̲̗̖ẫ̲ᷜç̲︣̌̕k̲͆̈, ĥ̲̗̖ã̲ᷜ̂ç̲̌︣̕k̲̈͆, ĥ̲̗̖ã̲̂ᷜç̲̌︣̕k̲͆̈, ĥ̗̖̲a̲ᷜ̂̃ç̲︣̌̕k̲͆̈, ĥ̖̗̲â̲ᷜ̃ç̲︣̌̕k̲̈͆, ĥ̗̖̲ã̲ᷜ̂ç̲̌︣̕k̲͆̈, ĥ̗̖̲ã̲̂ᷜç̲︣̌̕k̲̈͆. Some of those marks will get combined and reordered in NFC, but some will not. The ten look the same in NFC and NFD, or disordered as they are. They are UCA-sorted.Terry
In JavaScript, that would be: 'your_text'.normalize('NFC')Saga
P
15

A. It means what it says (see dan04’s explanation for a brief answer and the Unicode Standard for a long one), but it simply indicates that the authors of the validator wanted to issue the warning. HTML5 rules do not require Normalization Form C (NFC); it is rather something generally favored by the W3C.

B.There is no need to fix anything, unless you decide that using NFC would actually be better. If you do, then there are various tools for automatic conversion to NFC, such as the free BabelPad editor. If you only need to deal with one character not in NFC, you can use character information repositories such as Fileformat.info character search to find out the canonical decomposition of the character and use it.

Whether you use NFC or not depends on many considerations and on the characters involved. As a rule, NFC works better, but in some cases, an alternative, non-NFC presentation produces more suitable rendering or works better in some specific processing.

For example, in a duplicate question, the reference Ω has been reported as triggering the message. (The validator actually checks for characters entered as such references, too, instead of just plain text level NFC check.) The reference stands for U+2126 OHM SIGN “Ω”, which is defined to be canonical equivalent to U+03A9 GREEK CAPITAL LETTER OMEGA “Ω”. The Unicode Standard explicitly says that the latter is the preferred character. It is also better covered in fonts. But if you have a special reason to use OHM SIGN, you can do that, without violating current HTML5 rules, and you can ignore the validator warning.

Purnell answered 14/4, 2013 at 18:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.