The top answer to the question "How to detect emoji using javascript" uses Extended_Pictographic
.
Emoji_Presentation
is mentioned in this blog post by David Walsh.
The top answer to the question "How to detect emoji using javascript" uses Extended_Pictographic
.
Emoji_Presentation
is mentioned in this blog post by David Walsh.
One key difference is that Extended_Pictographic
will match a bunch of "pictographic" characters that aren't technically emojis, or aren't shown as colored emojis by default:
"1ππ―β‘β¨οΈ".match(/\p{Emoji_Presentation}/gu) // ['π', 'π―']
"1ππ―β‘β¨οΈ".match(/\p{Extended_Pictographic}/gu) // ['π', 'π―', 'β‘', 'β¨οΈ']
\p{Emoji_Presentation}
only matches emojis that are, by default, shown in their colored emoji form.
There is also \p{Emoji}
, but it's likely best avoided in most real-world circumstances:
"1ππ―β‘β¨οΈ".match(/\p{Emoji}/gu) // ['1', 'π', 'π―', 'β¨οΈ']
IIUC, 1
has an emoji/colored representation (1οΈβ£), and so is matched by \p{Emoji}
, but 1
isn't shown in its emoji/colored form by default, so it isn't matched by \p{Emoji_Presentation}
. Same with β¨οΈ
. And I guess β‘
isn't classed as an emoji at all, but does fall within the "pictographic" class.
Note that the above explanation implies that an emoji can be colored, and yet not matched by \p{Emoji_Presentation}
. For example, βοΈ is an "old" emoji that's displayed in black and white by default. The reason you see it colored is because it has the special "variation selector 16" (\uFE0F
) after it which makes it render in colored form. If you'd like to match all colored emojis in a string, regardless of their "default" presentation, then I think this should work:
"1ππ―β‘β¨οΈβοΈ".match(/(\p{Emoji}\uFE0F|\p{Emoji_Presentation})/gu) // ['π', 'π―', 'βοΈ']
The \p{Emoji}\uFE0F
part is what causes the above regex to match the snowflake. Note that in the above code block the snow flake is rendered as black and white, but that's just a CSS-related thing due to it being in a code block.
The Extended_Pictographic property is defined in Unicode Technical Standard #51, Unicode Emoji, and is primarily intended for use in line breaking and other segmentation algorithms.
The Emoji_Presentation property is also defined in UTS #51. It was created to help applications pick a default presentation when encountering certain Unicode characters: as monochrome text or as full-colour emoji. The need for this arose as a result of a decision made when emoji were first encoded to unify emoji used in Japan with existing Unicode characters when there was an existing character matching an emoji. (In hindsight, that might not have been the best decision because of this issue.)
Note that Extended_Pictographic is a property of Unicode code points, and that there are code points that are not (yet) assigned to any character. You can see a listing of all Extended_Pictographic code points grouped by their Emoji_Presentation (character) property values at https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AExtended_Pictographic%3A%5D&g=Emoji_Presentation&i=. (This shows them sub-grouped by blocks.)
© 2022 - 2024 β McMap. All rights reserved.