How to identify same emoji of different colors/skin tone in python?
Asked Answered
C

3

5

How to identify same emoji with different colours?

Example: πŸ‘πŸ» πŸ‘πŸ½ πŸ‘ the should be considered as being the same

Edit: Currently I am using emoji package

import regex
import emoji
exm = "poli kariku fans adi like πŸ‘πŸ» πŸ‘πŸ½ πŸ‘ sub tharamo"
characters = regex.findall("\X",exm)
for char in character:
  if char in emoji.UNICODE_EMOJI:
     #do something
Cupcake answered 26/9, 2020 at 11:35 Comment(1)
please provide the python package/sample code you are using for reading these emojis – Memorialist
U
3

I don't think you need to use regex for your use case you can just use the length of the emoji:

import emoji

NUM_COLUMNS = 4
TABLE_COLUMN_WIDTH = 18

def is_emoji(s):
    return s in emoji.UNICODE_EMOJI

def is_default_emoji(s):
    return len(s) == 1 and s in emoji.UNICODE_EMOJI

def get_default_emoji(s):
    return s[0] if s in emoji.UNICODE_EMOJI else None

def pretty_print_line(line):
    print(''.join(word.ljust(TABLE_COLUMN_WIDTH) for word in line.split()))

text = "poli kariku fans adi like πŸ‘πŸ» πŸ‘πŸΌ πŸ‘πŸ½ πŸ‘πŸΎ πŸ‘πŸΏ πŸ‘ sub tharamo"

pretty_print_line("string is_emoji is_default_emoji get_default_emoji")
print("=" * NUM_COLUMNS * TABLE_COLUMN_WIDTH)
for s in text.split():
    pretty_print_line(f'{s} {is_emoji(s)} {is_default_emoji(s)} {get_default_emoji(s)}')

Output:

string            is_emoji          is_default_emoji  get_default_emoji 
========================================================================
poli              False             False             None              
kariku            False             False             None              
fans              False             False             None              
adi               False             False             None              
like              False             False             None              
πŸ‘πŸ»                True              False             πŸ‘                 
πŸ‘πŸΌ                True              False             πŸ‘                 
πŸ‘πŸ½                True              False             πŸ‘                 
πŸ‘πŸΎ                True              False             πŸ‘                 
πŸ‘πŸΏ                True              False             πŸ‘                 
πŸ‘                True              True              πŸ‘                 
sub               False             False             None              
tharamo           False             False             None  

You could use some logic similar to get_default_emoji for your use case since it returns the same emoji regardless of the presence or lack of presence of a skin tone modifier.

Unfrock answered 26/9, 2020 at 12:38 Comment(1)
I had thought the same thing in our code base, but unfortunately it is too naive. Unicode characters, including emoji, can be more than two code points, and the second code point can be other modifiers besides skin tone. is_default_emoji and get_default_emoji will fail for emojis like "man: beard" or "man: skin tone light, beard" which have 4 and 5 code points – Phlebitis
M
6

There is no method available on the emoji package to treat same symbols with different colors similar. However, we can achieve this by comparing the emoji names with the common string (Here it is thumbs_up)

Try the below code.

import regex
import emoji
exm = "poli kariku fans adi like πŸ‘πŸ» πŸ‘πŸ½ πŸ‘ sub tharamo"
characters = regex.findall("\X",exm)
for char in characters:
  if char in emoji.UNICODE_EMOJI:
      if "thumbs_up" in (emoji.demojize(char)):
          print("It is thumbs_up")

For the list of emojis supported and their names refer to the source code here.

Memorialist answered 26/9, 2020 at 12:19 Comment(1)
correct character splitting is important. Good job – Briny
P
6

Each thumbs up emoji starts with codepoint 1F44D ('THUMBS UP SIGN'), followed by the skin tone component codepoint, so you can filter them like this:

thumbs_up = chr(0x1F44D) # or '\N{THUMBS UP SIGN}' or just 'πŸ‘'
thumbs = filter(lambda c: c.startswith(thumbs_up), characters)
for thumb in thumbs:
    print(thumbs)

Output

πŸ‘πŸ»
πŸ‘πŸ½
πŸ‘
Pence answered 26/9, 2020 at 12:36 Comment(0)
U
3

I don't think you need to use regex for your use case you can just use the length of the emoji:

import emoji

NUM_COLUMNS = 4
TABLE_COLUMN_WIDTH = 18

def is_emoji(s):
    return s in emoji.UNICODE_EMOJI

def is_default_emoji(s):
    return len(s) == 1 and s in emoji.UNICODE_EMOJI

def get_default_emoji(s):
    return s[0] if s in emoji.UNICODE_EMOJI else None

def pretty_print_line(line):
    print(''.join(word.ljust(TABLE_COLUMN_WIDTH) for word in line.split()))

text = "poli kariku fans adi like πŸ‘πŸ» πŸ‘πŸΌ πŸ‘πŸ½ πŸ‘πŸΎ πŸ‘πŸΏ πŸ‘ sub tharamo"

pretty_print_line("string is_emoji is_default_emoji get_default_emoji")
print("=" * NUM_COLUMNS * TABLE_COLUMN_WIDTH)
for s in text.split():
    pretty_print_line(f'{s} {is_emoji(s)} {is_default_emoji(s)} {get_default_emoji(s)}')

Output:

string            is_emoji          is_default_emoji  get_default_emoji 
========================================================================
poli              False             False             None              
kariku            False             False             None              
fans              False             False             None              
adi               False             False             None              
like              False             False             None              
πŸ‘πŸ»                True              False             πŸ‘                 
πŸ‘πŸΌ                True              False             πŸ‘                 
πŸ‘πŸ½                True              False             πŸ‘                 
πŸ‘πŸΎ                True              False             πŸ‘                 
πŸ‘πŸΏ                True              False             πŸ‘                 
πŸ‘                True              True              πŸ‘                 
sub               False             False             None              
tharamo           False             False             None  

You could use some logic similar to get_default_emoji for your use case since it returns the same emoji regardless of the presence or lack of presence of a skin tone modifier.

Unfrock answered 26/9, 2020 at 12:38 Comment(1)
I had thought the same thing in our code base, but unfortunately it is too naive. Unicode characters, including emoji, can be more than two code points, and the second code point can be other modifiers besides skin tone. is_default_emoji and get_default_emoji will fail for emojis like "man: beard" or "man: skin tone light, beard" which have 4 and 5 code points – Phlebitis

© 2022 - 2024 β€” McMap. All rights reserved.