Are all Unicode Emoji ZWJ Sequences valid?
Asked Answered
G

2

11

When creating an emoji font, is any sequence of ZERO WIDTH JOINER valid?

For instance: can I use 🏳‍★‍🟩 (Waving White Flag + zwj + Black Star + zwj + Green Square) to represent a white flag with a green star on it? And then render it, lets say like the Esperanto flag?

Gold answered 1/5, 2020 at 8:0 Comment(0)
M
7

There are restrictions on what can be part of ZWJ sequences and what cannot. Unicode Technical Standard #51 lays out these rules.

According to definition ED-15a, a well-formed ZWJ sequence can only consist of:

  • Emoji characters (a character with the property Emoji=True)
  • Emoji presentation sequences (an emoji character followed by U+FE0F VARIATION SELECTOR-16, all valid combinations of which are listed in this data file)
  • Emoji modifier sequences (a character with the property Emoji_Modifier_Base=True followed by a character with the property Emoji_Modifier=True)

All relevant properties can be found in this data file.

U+2605 BLACK STAR is not an emoji character (and it is obviously not one of those types of sequences either), so it would not be valid for use in ZWJ sequences as of the time of writing, but you could substitute U+2B50 WHITE MEDIUM STAR (which is an emoji) instead. Other than that, 🏳️ and 🟩 are fair game.

Side note on U+1F3F3 WAVING WHITE FLAG: This character is an emoji, but it has the property Emoji_Presentation=False, which means it is intended to display as text-style (monochrome rather than colourful) by default. To force emoji-style display, U+FE0F VARIATION SELECTOR-16 has to be appended to it. It is recommended that these variation selectors always be included for characters where Emoji_Presentation=False.

U+2B50 WHITE MEDIUM STAR is also a valid base for such emoji presentation sequences, but it has Emoji_Presentation=True by default and the variation selector is thus entirely optional. U+1F7E9 LARGE GREEN SQUARE meanwhile is not a valid base for emoji presentation sequences and therefore must never be followed by VARIATION SELECTOR-16. I know, it’s convoluted.

What all this means is that you have two choices for the precise sequence of codepoints you want to use, both of which are equally valid. Either:

🏳️‍⭐‍🟩 <U+1F3F3, U+FE0F, U+200D, U+2B50, U+200D, U+1F7E9>

Or:

🏳️‍⭐️‍🟩 <U+1F3F3, U+FE0F, U+200D, U+2B50, U+FE0F, U+200D, U+1F7E9>

Meletius answered 1/5, 2020 at 11:46 Comment(4)
this is a very insightful answer. Thank you very much!Gold
ZWJ predates emojis. Wrt emojis, the above is true. However, it is valid in many other codepoint sequences, so don’t make assumptions that ZWJ without emojis is invalid. The TR above, btw, is specifically about emojis. See, for instance: unicode-explorer.com/c/200DSimonesimoneau
@Simonesimoneau The term “ZWJ sequence” is exclusively used in the context of emoji, so there is no ambiguity in this regard. Obviously any Unicode characters can be used in any sequence and none of these combinations are “invalid” even if they aren’t necessarily meaningful, but when it comes to emoji it’s generally best to follow UTS #51.Meletius
I think you mean "emoji ZWJ sequence"? That document is specific to emojis. In fact, it explains the use of ZWJ in different codepoint sequences as well. Using ZWJ is necessary in Arabic and many Indic scripts. But anyway, this question appears to be specifically about "Emoji ZWJ sequence", and in that context it is all correct. I clarified the title to that effect.Simonesimoneau
L
2

I notice your interest in creating the Esperanto flag, but I think font rendering is more complex than just lining up codepoints.

Your brute force approach does not work "as is".

<div>
    &#x1F3F3;&#xFE0F;&#x200D;&#x2605;&#x200D;&#x1F7E9;
</div>

The Unicode standard says in Recommended Emoji ZWJ Sequences, v13.0:

The following are the recommended emoji zwj sequences, which use a U+200D ZERO WIDTH JOINER (ZWJ) to join the characters into a single glyph if available. When not available, the ZWJ characters are ignored and a fallback sequence of separate emoji is displayed. Thus an emoji zwj sequence should only be supported where the fallback sequence would also make sense to a viewer.

I was wondering which part of the font rendering mechanism would be responsible for checking the "availability" (i.e. the rendering engine supporting a certain Unicode version, or the application, or the font), and guessed "the font".

So I came across this article on Emoji fonts, and indeed, font files can contain data on Ligature Substitution, see OpenType for example. Microsoft provides a tool called VOLT which allows the definition of ligatures.

I have no idea about font design, but I would try to create a colored flag glyph with a font editor (sketched here), and define the ligature substitution. (no implied warranty ;) )

Lakitalaks answered 1/5, 2020 at 9:11 Comment(4)
I am aware of the fact that existing emoji implementations might support that ligature. My question is, would it be valid if they did?Gold
But there is an interesting quote in that document you linked to: Thus an emoji zwj sequence should only be supported where the fallback sequence would also make sense to a viewer.Gold
my understanding is that it is the font definition that defines if it's validLakitalaks
since this is issue is not exactly a programming question, maybe it can be answered on another SE site: graphicdesign.stackexchange.com/?tags=fontsLakitalaks

© 2022 - 2024 — McMap. All rights reserved.