Python - replace unicode emojis with ASCII characters
Asked Answered
S

3

6

I have an issue with one of my current weekend projects. I am writing a Python script that fetches some data from different sources and then spits everything out to an esc-pos printer. As you might imagine pos printers don't exactly like emojis...

So text like this:

ๅฏ็ˆฑ!!!!!!!!๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜

gives me this character string:

'\u53ef\u7231!!!!!!!!\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f61d'

The result that comes out of the printer is quite different than what I would like of course. So I need to replace these non-ASCII characters with something else. I don't really care for the first characters, but I do care about emojis. Using something like: unidecode(str(text)) will at least strip them out, but I want to convert them to something more useful. Either into classic smilies like [:-D] or into [SMILING FACE WITH HEART-SHAPED EYES].

My problem is... how would one go about doing this? Manually creating a lookup table for most common emojis seems a bit tedious, so I am wondering if there is something else that I can do.

Schrecklichkeit answered 5/5, 2017 at 5:56 Comment(1)
You can hope for a library or do the tedious work. There is this that might help. โ€“ Aborticide
S
8

With the tip about unicodedata.name and some further research I managed to put this thing together:

import unicodedata
from unidecode import unidecode

def deEmojify(inputString):
    returnString = ""

    for character in inputString:
        try:
            character.encode("ascii")
            returnString += character
        except UnicodeEncodeError:
            replaced = unidecode(str(character))
            if replaced != '':
                returnString += replaced
            else:
                try:
                     returnString += "[" + unicodedata.name(character) + "]"
                except ValueError:
                     returnString += "[x]"

    return returnString

Basically it first tries to find the most appropriate ascii representation, if that fails it tries using the unicode name, and if even that fails it simply replaces it with some simple marker.

For example Taking this string:

abcdลกeฤ‘fฤgลพhร…iร˜jร†k๏ฃฟ ๅฏ็ˆฑ!!!!!!!!๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜๐Ÿ˜

And running the function:

string = u'abcdลกeฤ‘fฤgลพhร…iร˜jร†k๏ฃฟ \u53ef\u7231!!!!!!!!\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f61d'
print(deEmojify(string))

Will produce the following result:

abcdsedfcgzhAiOjAEk[x] Ke Ai !!!!!!!![SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][FACE WITH STUCK-OUT TONGUE AND TIGHTLY-CLOSED EYES]

Schrecklichkeit answered 5/5, 2017 at 21:5 Comment(0)
E
3

Try this

import unicodedata
print( unicodedata.name(u'\U0001f60d'))

result is

SMILING FACE WITH HEART-SHAPED EYES
Excrescence answered 5/5, 2017 at 6:39 Comment(0)
P
0

I faced the same problem and solved it using the following in fuction:

import demoji

def deEmojify(inputString):
    return demoji.replace_with_desc(inputString.encode('utf-16', 'surrogatepass').decode('utf-16'))
Phraseologist answered 13/10, 2020 at 17:16 Comment(0)

© 2022 - 2024 โ€” McMap. All rights reserved.