I'm building an application that requires the use of emojis, specifically generating large sequences of random emojis. This requires having a large list to pull from. Rather than taking the approach detailed here by looping over hardcode hex ranges, I decided to take a different approach and download and parse data from the Unicode website. From there, I do some code-generation and write all the unique emojis to disk which I can then pick up inside my application. All this happens either as a manual step or a build step for my app.
However, the Unicode specification is complicated and I'm unsure which data I should be pulling from to build up a definitive list. There are three files under the latest version of Unicode (14.0):
There are also two files in the Unicode Character Database (UCD):
There are definitely duplicates amongst all these lists such as π and while I could download and parse all five files and reduce the list down to unique instances in my script, I'd like to keep my script as simple as I can without doing unnecessary work.
From what I understand:
- emoji-test.txt is a grouping of emoji characters as you might see in a keyboard, grouped by category
- emoji-sequences.txt is a list of emoji ranges, single emojis, and multi character emojis such as π¦π¨ (
1F1E6 1F1E8
) or emojis combined with a variation selector likeFE0F
- emoji-zwj-sequences.txt is a list of emojis joined by the zero width joiner character
- emoji-variation-sequences is a list of emojis that can be presented either in textual form or as emojis
- emoji-data.txt seems to be a very comprehensive list of not just emojis but also emoji modifiers and the like
All this has left me rather perplexed as to which list or combination of lists would give me the most comprehensive list of emojis. emoji-data.txt
seems to have a most wide-ranging list but I don't want things like emoji modifiers or emoji components; I'm only looking for emojis that a user can select with the keyboard (for example you can't select a skin tone modifier by itself).
Which lists or combination of lists would yield the most comprehensive, wide-ranging list of emojis that I could use in my app?