Removing all Emojis from Text
Asked Answered
R

4

5

This question has been asked here Python : How to remove all emojis Without a solution, I have as step towards the solution. But need help finishing it off.

I went and got all the emoji hex code points from the emoji site: https://www.unicode.org/emoji/charts/emoji-ordering.txt

I then read in the file like so:

file = open('emoji-ordering.txt')
temp = file.readline()

final_list = []

while temp != '':
    #print(temp)
    if not temp[0] == '#' :
            utf_8_values = ((temp.split(';')[0]).rstrip()).split(' ')
            values = ["u\\"+(word[0]+((8 - len(word[2:]))*'0' + word[2:]).rstrip()) for word in utf_8_values]
            #print(values[0])
            final_list = final_list + values
    temp = file.readline()

print(final_list)

I hoped this would give me unicode literals. It does not, my goal is to get unicode literals so I can use part of the solution from the last question and be able to exclude all emojis. Any ideas what we need to get a solution?

Radium answered 6/7, 2018 at 21:36 Comment(0)
B
11

First install emoji:

pip install emoji

or

pip3 install emoji

So do this:

import emoji
    
def give_emoji_free_text(self, text):
    allchars = [str for str in text]
    emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
    clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])
        
    return clean_text
    
text = give_emoji_free_text(text)

This work for me!

Or you can try:

emoji_pattern = re.compile("["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U0001F1F2-\U0001F1F4"  # Macau flag
        u"\U0001F1E6-\U0001F1FF"  # flags
        u"\U0001F600-\U0001F64F"
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001f926-\U0001f937"
        u"\U0001F1F2"
        u"\U0001F1F4"
        u"\U0001F620"
        u"\u200d"
        u"\u2640-\u2642"
        "]+", flags=re.UNICODE)

text = emoji_pattern.sub(r'', text)

UPDATE:

Version emoji==1.7.0 is the last version that has UNICODE_EMOJI.

You can also try to use EMOJI_DATA as a replacement for UNICODE_EMOJI. If you explain how you use UNICODE_EMOJI or show your code, I can give more specific help.

Or you can try one of the two above solutions:

text = re.sub(emoji.get_emoji_regexp(), r"", text)
emoji.replace_emoji(text)
Bedpost answered 29/9, 2018 at 18:56 Comment(2)
Hmm I have the module but get error: module 'emoji' has no attribute 'UNICODE_EMOJI' – Extroversion
@Extroversion Version emoji==1.7.0 is the last version that has UNICODE_EMOJI. You can also try to use EMOJI_DATA as a replacement for UNICODE_EMOJI. If you explain how you use UNICODE_EMOJI or show your code, I can give more specific help. – Progress
P
5

Here's a Python script that uses the emoji library's get_emoji_regexp().

It reads text from a file and writes the emoji-free text to another file.

import emoji
import re

def strip_emoji(text):
    print(emoji.emoji_count(text))
    new_text = re.sub(emoji.get_emoji_regexp(), r"", text)
    return new_text


with open("my_file.md", "r") as file:
    old_text = file.read()

no_emoji_text = strip_emoji(old_text)

with open("file.md", "w+") as new_file:
    new_file.write(no_emoji_text)
Packhorse answered 26/8, 2019 at 19:17 Comment(0)
A
2

Install emoji

pip install emoji

And then simply run

emoji.replace_emoji(text)
Abstergent answered 1/6, 2022 at 14:54 Comment(0)
H
1

In the latest version of the emoji package (v2.2.0), one can simply use the replace_emoji method.

!pip install emoji
import emoji

However, one must specify what the emojis should be replaced with through the replace attribute. So:

text = 'I like having fun πŸ˜ƒ'
text = emoji.replace_emoji(text, replace='')
text
>>> 'I like having fun'

Otherwise, e.g., see @Shaked Lokits answer, the emoji will be substituted with its string equivalent:

text = 'I like having fun πŸ‘'
text = emoji.replace_emoji(text, replace='')
text
>>> 'I like having fun :thumbs_up'
Hydrozoan answered 21/2, 2023 at 13:50 Comment(0)

© 2022 - 2024 β€” McMap. All rights reserved.