How to identify same emoji of different colors/skin tone in python?

Asked 26/9, 2020 at 11:35 Answered 26/9, 2020 at 12:38

How to identify same emoji with different colours?

Example: 👍🏻 👍🏽 👍 the should be considered as being the same

Edit: Currently I am using emoji package

import regex
import emoji
exm = "poli kariku fans adi like 👍🏻 👍🏽 👍 sub tharamo"
characters = regex.findall("\X",exm)
for char in character:
  if char in emoji.UNICODE_EMOJI:
     #do something

Cupcake answered 26/9, 2020 at 11:35 Comment(1)

please provide the python package/sample code you are using for reading these emojis – Memorialist 26/9, 2020 at 11:40

I don't think you need to use regex for your use case you can just use the length of the emoji:

import emoji

NUM_COLUMNS = 4
TABLE_COLUMN_WIDTH = 18

def is_emoji(s):
    return s in emoji.UNICODE_EMOJI

def is_default_emoji(s):
    return len(s) == 1 and s in emoji.UNICODE_EMOJI

def get_default_emoji(s):
    return s[0] if s in emoji.UNICODE_EMOJI else None

def pretty_print_line(line):
    print(''.join(word.ljust(TABLE_COLUMN_WIDTH) for word in line.split()))

text = "poli kariku fans adi like 👍🏻 👍🏼 👍🏽 👍🏾 👍🏿 👍 sub tharamo"

pretty_print_line("string is_emoji is_default_emoji get_default_emoji")
print("=" * NUM_COLUMNS * TABLE_COLUMN_WIDTH)
for s in text.split():
    pretty_print_line(f'{s} {is_emoji(s)} {is_default_emoji(s)} {get_default_emoji(s)}')

Output:

string            is_emoji          is_default_emoji  get_default_emoji 
========================================================================
poli              False             False             None              
kariku            False             False             None              
fans              False             False             None              
adi               False             False             None              
like              False             False             None              
👍🏻                True              False             👍                 
👍🏼                True              False             👍                 
👍🏽                True              False             👍                 
👍🏾                True              False             👍                 
👍🏿                True              False             👍                 
👍                True              True              👍                 
sub               False             False             None              
tharamo           False             False             None

You could use some logic similar to get_default_emoji for your use case since it returns the same emoji regardless of the presence or lack of presence of a skin tone modifier.

Unfrock answered 26/9, 2020 at 12:38 Comment(1)

I had thought the same thing in our code base, but unfortunately it is too naive. Unicode characters, including emoji, can be more than two code points, and the second code point can be other modifiers besides skin tone. is_default_emoji and get_default_emoji will fail for emojis like "man: beard" or "man: skin tone light, beard" which have 4 and 5 code points – Phlebitis 28/3 at 2:0

There is no method available on the emoji package to treat same symbols with different colors similar. However, we can achieve this by comparing the emoji names with the common string (Here it is thumbs_up)

Try the below code.

import regex
import emoji
exm = "poli kariku fans adi like 👍🏻 👍🏽 👍 sub tharamo"
characters = regex.findall("\X",exm)
for char in characters:
  if char in emoji.UNICODE_EMOJI:
      if "thumbs_up" in (emoji.demojize(char)):
          print("It is thumbs_up")

For the list of emojis supported and their names refer to the source code here.

Memorialist answered 26/9, 2020 at 12:19 Comment(1)

correct character splitting is important. Good job – Briny 13/3, 2021 at 19:28

Each thumbs up emoji starts with codepoint 1F44D ('THUMBS UP SIGN'), followed by the skin tone component codepoint, so you can filter them like this:

thumbs_up = chr(0x1F44D) # or '\N{THUMBS UP SIGN}' or just '👍'
thumbs = filter(lambda c: c.startswith(thumbs_up), characters)
for thumb in thumbs:
    print(thumbs)

Output

👍🏻
👍🏽
👍

Pence answered 26/9, 2020 at 12:36 Comment(0)

I don't think you need to use regex for your use case you can just use the length of the emoji:

import emoji

NUM_COLUMNS = 4
TABLE_COLUMN_WIDTH = 18

def is_emoji(s):
    return s in emoji.UNICODE_EMOJI

def is_default_emoji(s):
    return len(s) == 1 and s in emoji.UNICODE_EMOJI

def get_default_emoji(s):
    return s[0] if s in emoji.UNICODE_EMOJI else None

def pretty_print_line(line):
    print(''.join(word.ljust(TABLE_COLUMN_WIDTH) for word in line.split()))

text = "poli kariku fans adi like 👍🏻 👍🏼 👍🏽 👍🏾 👍🏿 👍 sub tharamo"

pretty_print_line("string is_emoji is_default_emoji get_default_emoji")
print("=" * NUM_COLUMNS * TABLE_COLUMN_WIDTH)
for s in text.split():
    pretty_print_line(f'{s} {is_emoji(s)} {is_default_emoji(s)} {get_default_emoji(s)}')

Output:

string            is_emoji          is_default_emoji  get_default_emoji 
========================================================================
poli              False             False             None              
kariku            False             False             None              
fans              False             False             None              
adi               False             False             None              
like              False             False             None              
👍🏻                True              False             👍                 
👍🏼                True              False             👍                 
👍🏽                True              False             👍                 
👍🏾                True              False             👍                 
👍🏿                True              False             👍                 
👍                True              True              👍                 
sub               False             False             None              
tharamo           False             False             None

You could use some logic similar to get_default_emoji for your use case since it returns the same emoji regardless of the presence or lack of presence of a skin tone modifier.

Unfrock answered 26/9, 2020 at 12:38 Comment(1)

Recommended topics

Hot tags