How to obtain a full list of Unicode emojis from the Unicode website
Asked Answered
T

1

7

I'm building an application that requires the use of emojis, specifically generating large sequences of random emojis. This requires having a large list to pull from. Rather than taking the approach detailed here by looping over hardcode hex ranges, I decided to take a different approach and download and parse data from the Unicode website. From there, I do some code-generation and write all the unique emojis to disk which I can then pick up inside my application. All this happens either as a manual step or a build step for my app.

However, the Unicode specification is complicated and I'm unsure which data I should be pulling from to build up a definitive list. There are three files under the latest version of Unicode (14.0):

There are also two files in the Unicode Character Database (UCD):

There are definitely duplicates amongst all these lists such as πŸ˜€ and while I could download and parse all five files and reduce the list down to unique instances in my script, I'd like to keep my script as simple as I can without doing unnecessary work.

From what I understand:

  • emoji-test.txt is a grouping of emoji characters as you might see in a keyboard, grouped by category
  • emoji-sequences.txt is a list of emoji ranges, single emojis, and multi character emojis such as πŸ‡¦πŸ‡¨ (1F1E6 1F1E8) or emojis combined with a variation selector like FE0F
  • emoji-zwj-sequences.txt is a list of emojis joined by the zero width joiner character
  • emoji-variation-sequences is a list of emojis that can be presented either in textual form or as emojis
  • emoji-data.txt seems to be a very comprehensive list of not just emojis but also emoji modifiers and the like

All this has left me rather perplexed as to which list or combination of lists would give me the most comprehensive list of emojis. emoji-data.txt seems to have a most wide-ranging list but I don't want things like emoji modifiers or emoji components; I'm only looking for emojis that a user can select with the keyboard (for example you can't select a skin tone modifier by itself).

Which lists or combination of lists would yield the most comprehensive, wide-ranging list of emojis that I could use in my app?

Tumular answered 9/3, 2022 at 3:46 Comment(3)
Another comprehensive resource: emojipedia.org/emoji-sequence – Pintsize
@Pintsize that's not really useful though because as far as I can tell, there's no easy way to download and parse emojipedia whereas unicodes text files are meant to be parsed. That's great that it's a good resource but I'm specifically asking about the Unicode website. – Tumular
This project may be of use to you. – Goines
B
1

Use the union of emoji-sequences.txt and emoji-zwj-sequences.txt. That set comprises the emoji recommended for general interchange. see https://www.unicode.org/reports/tr51/tr51-19.html#def_rgi_set.

Brochure answered 12/3, 2022 at 16:54 Comment(3)
I don't believe that's enough though. The Unicode website says there are 3633 available in version 14 and those two files together don't add up to that count AFAIK. – Tumular
@Tumular The emoji counts page shows 3633. (Note that the last row, 12, are emoji components that are not used as standalone emoji. So, the more interesting count is 3621.) In fact, if you look in those two files, emoji-zwj-sequences.txt has 1349, and emoji-sequences.txt has 2284, which adds up to 3633. Note that some rows in the Basic_Emoji section of emoji-sequences.txt give ranges that cover multiple emoji characters. – Brochure
Ah my mistake. I actually ended up going with the emoji-test file because that has emojis organized by group and subgroup. – Tumular

© 2022 - 2024 β€” McMap. All rights reserved.