Printing out all unicode emojis to file
Asked Answered
P

1

1

It's possible to print the hexcode of the emoji with u'\uXXX' pattern in Python, e.g.

>>> print(u'\u231B')
⌛

However, if I have a list of hex code like 231B, just "adding" the string won't work:

>>> print(u'\u' + ' 231B')
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

The chr() fails too:

>>> chr('231B')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: an integer is required (got type str)

My first part of the question is given the hexcode, e.g. 231A how do I get the str type of the emoji?

My goal is to getting the list of emojis from https://unicode.org/Public/emoji/13.0/emoji-sequences.txt and read the hexcode on the first column.

There are cases where it ranges from 231A..231B, the second part of my question is given a hexcode range, how do I iterate through the range to get the emoji str, e.g. 2648..2653, it is possible to do range(2648, 2653+1) but if there's a character in the hexa, e.g. 1F232..1F236, using range() is not possible.


Thanks @amadan for the solutions!!

TL;DR

To get a list of emojis from https://unicode.org/Public/emoji/13.0/emoji-sequences.txt into a file.

import requests
response = requests.get('https://unicode.org/Public/emoji/13.0/emoji-sequences.txt')

with open('emoji.txt', 'w') as fout:
    for line in response.content.decode('utf8').split('\n'):
        if line.strip() and not line.startswith('#'):
            hexa = line.split(';')[0]
            hexa = hexa.split('..')            
            if len(hexa) == 1:
                ch = ''.join([chr(int(h, 16)) for h in hexa[0].strip().split(' ')])
                print(ch, end='\n', file=fout)
            else:
                start, end = hexa
                for ch in range(int(start, 16), int(end, 16)+1):
                    #ch = ''.join([chr(int(h, 16)) for h in ch.split(' ')])
                    print(chr(ch), end='\n', file=fout)
Presswork answered 9/3, 2020 at 5:37 Comment(0)
B
3

Convert hex string to number, then use chr:

chr(int('231B', 16))
# => '⌛'

or directly use a hex literal:

chr(0x231B)

To use a range, again, you need an int, either converted from a string or using a hex literal:

''.join(chr(c) for c in range(0x2648, 0x2654))
# => '♈♉♊♋♌♍♎♏♐♑♒♓'

or

''.join(chr(c) for c in range(int('2648', 16), int('2654', 16)))

(NOTE: you'd get something very different from range(2648, 2654)!)

Behlau answered 9/3, 2020 at 5:40 Comment(7)
Does int('2654', 16) include 2654?Presswork
It is an integer, it doesn't have a concept of inclusion. int('2654', 16) is equal to 0x2654 and 9812, representing the code point 'WHITE CHESS KING'. If you are asking if range(0x2648, 0x2654) includes 0x2654, then no, it does not, since range never includes its endpoint; see Why does range(start, end) not include end?Behlau
Ah, but the range does include in unicode emoji. so int('2654', 16) + 1 =)Presswork
No, you said 2648..2653; I already added the needed 1. If you want, you can write int('2653', 16) + 1.Behlau
Ah yes, my fault, didn't read the range properly. You're write, it's already +1 =)Presswork
How about emojis like 00A9 FE0F?Presswork
Those are just two characters next to each other: chr(0xa9) + chr(0xfe0f).Behlau

© 2022 - 2024 — McMap. All rights reserved.