What is the difference between {Emoji_Presentation} and {Extended_Pictographic}?
Asked Answered
C

2

15

The top answer to the question "How to detect emoji using javascript" uses Extended_Pictographic.

Emoji_Presentation is mentioned in this blog post by David Walsh.

Chyou answered 18/12, 2021 at 6:57 Comment(1)
Excellent question, didn't know about Emoji_Presentation. I added your question in my answer you linked – Estrange
F
23

One key difference is that Extended_Pictographic will match a bunch of "pictographic" characters that aren't technically emojis, or aren't shown as colored emojis by default:

"1πŸ˜‚πŸ’―β™‘βŒ¨οΈŽ".match(/\p{Emoji_Presentation}/gu)    // ['πŸ˜‚', 'πŸ’―']
"1πŸ˜‚πŸ’―β™‘βŒ¨οΈŽ".match(/\p{Extended_Pictographic}/gu) // ['πŸ˜‚', 'πŸ’―', 'β™‘', '⌨︎']

\p{Emoji_Presentation} only matches emojis that are, by default, shown in their colored emoji form.

There is also \p{Emoji}, but it's likely best avoided in most real-world circumstances:

"1πŸ˜‚πŸ’―β™‘βŒ¨οΈŽ".match(/\p{Emoji}/gu) // ['1', 'πŸ˜‚', 'πŸ’―', '⌨︎']

IIUC, 1 has an emoji/colored representation (1️⃣), and so is matched by \p{Emoji}, but 1 isn't shown in its emoji/colored form by default, so it isn't matched by \p{Emoji_Presentation}. Same with ⌨︎. And I guess β™‘ isn't classed as an emoji at all, but does fall within the "pictographic" class.

Note that the above explanation implies that an emoji can be colored, and yet not matched by \p{Emoji_Presentation}. For example, ❄️ is an "old" emoji that's displayed in black and white by default. The reason you see it colored is because it has the special "variation selector 16" (\uFE0F) after it which makes it render in colored form. If you'd like to match all colored emojis in a string, regardless of their "default" presentation, then I think this should work:

"1πŸ˜‚πŸ’―β™‘βŒ¨οΈŽβ„οΈ".match(/(\p{Emoji}\uFE0F|\p{Emoji_Presentation})/gu) // ['πŸ˜‚', 'πŸ’―', '❄️']

The \p{Emoji}\uFE0F part is what causes the above regex to match the snowflake. Note that in the above code block the snow flake is rendered as black and white, but that's just a CSS-related thing due to it being in a code block.

Folkways answered 23/6, 2022 at 9:35 Comment(0)
B
6

The Extended_Pictographic property is defined in Unicode Technical Standard #51, Unicode Emoji, and is primarily intended for use in line breaking and other segmentation algorithms.

The Emoji_Presentation property is also defined in UTS #51. It was created to help applications pick a default presentation when encountering certain Unicode characters: as monochrome text or as full-colour emoji. The need for this arose as a result of a decision made when emoji were first encoded to unify emoji used in Japan with existing Unicode characters when there was an existing character matching an emoji. (In hindsight, that might not have been the best decision because of this issue.)

Note that Extended_Pictographic is a property of Unicode code points, and that there are code points that are not (yet) assigned to any character. You can see a listing of all Extended_Pictographic code points grouped by their Emoji_Presentation (character) property values at https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AExtended_Pictographic%3A%5D&g=Emoji_Presentation&i=. (This shows them sub-grouped by blocks.)

Botulism answered 19/12, 2021 at 16:36 Comment(0)

© 2022 - 2024 β€” McMap. All rights reserved.