What's the unicode glyph used to indicate combining characters?
Asked Answered
B

2

14

My application needs to display "orphaned" combining characters. I would like to use the same format as the "official" unicode charts, using the dotted circle placeholder. See, for example:

A quick scan through the charts and I came up with U+25CC "DOTTED CIRCLE". That looks good, but the note on this character reads:

note that the reference glyph for this character is intentionally larger than the dotted circle glyph used to indicate combining characters in this standard; see, for example, 0300

Which says (I think) that U+25CC is not the correct character. (Or, if it is, perhaps just a poorly worded note.)

So: if the dotted circle used on the "Combining Diacritical Marks" is not U+25CC, what is the correct code for that little booger?

I have tried:

  • Copying the text from the PDF and inspecting it, but the copy is disabled in the PDF.
  • Emailing it to myself in Gmail and then viewing the attachment as HTML, but there is gets converted to U+0024 ("DOLLAR SIGN"). Which means that either the conversion failed or they are just playing some font rendering games in the PDF.

[Clarification] I realize that the U+25CC looks OK (assuming one's font supports it), but it sounds like the spec says that this is the wrong character. Many unicode characters have similar glyphs but are different characters, semantically speaking. "Latin Capital Letter A" (U+0041) and "Greek Capital Letter Alpha" (U+0391) will look identical for most fonts, but they have different semantic meanings and are not interchangable.

Bracteate answered 8/2, 2010 at 20:50 Comment(1)
Most fonts do actually include a dotted circle glyph if one is needed for the alphabet etc they cover. But glyphs do not have to map to codepoints! Many fonts do map such internal glyphs to the private usage section of Unicode, meaning they probably differ widely from font to font. So you might be able to get it using low level font access. But it could also be that different fonts implement it differently so there might not even be a way to retrieve the glyph that works on all fonts that have one. 25CC is probably the way to go.Jackqueline
T
8

I don't think there is an official placeholder character. The way I read that note, they chose U+25CC arbitrarily, purely for display purposes. Then, in the chart where the "real" dotted circle is listed, they made it a little larger to emphasize that it's not being used as a placeholder there. (Or maybe they shrunk it in the other charts; as you said, the note's poorly worded.)

Whatever the case, I don't see any reason not to use U+25CC as your placeholder.

Turino answered 9/2, 2010 at 7:33 Comment(0)
I
3

Just tried this: create a blank .html file, copy the text, and load in Firefox. Displays as expected (although I really didn't expect space+combining character to display correctly):

<html>
<body>
<font size="24pt">
&#x25CC;&#x0300;
&#x25CC;&#x0301;
&#x25CC;&#x0302;
&#x25CC;&#x0303;
<br/>
&#x0041;&#x0300;
&#x0041;&#x0301;
&#x0041;&#x0302;
&#x0041;&#x0303;
<br/>
&#x0020;&#x0300;
&#x0020;&#x0301;
&#x0020;&#x0302;
&#x0020;&#x0303;
</font>
</body>
</html>
Intermarriage answered 8/2, 2010 at 21:14 Comment(1)
I added a clarification to my original question. I realize that U+25CC looks correct, but it sounds like it is not the correct character, semantically speaking, according to the spec.Bracteate

© 2022 - 2024 — McMap. All rights reserved.