'UCS-2' codec can't encode characters in position 1050-1050
Asked Answered
C

3

12

When I run my Python code, I get the following errors:

  File "E:\python343\crawler.py", line 31, in <module>
    print (x1)
  File "E:\python343\lib\idlelib\PyShell.py", line 1347, in write
    return self.shell.write(s, self.tags)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1050-1050: Non-BMP character not supported in Tk

Here is my code:

x = g.request('search', {'q' : 'TaylorSwift', 'type' : 'page', 'limit' : 100})['data'][0]['id']

# GET ALL STATUS POST ON PARTICULAR PAGE(X=PAGE ID)
for x1 in g.get_connections(x, 'feed')['data']:
    print (x1)
    for x2 in x1:
        print (x2)
        if(x2[1]=='status'):
            x2['message']

How can I fix this?

Cryptonym answered 7/9, 2015 at 16:10 Comment(1)
if you need to see non-bmp Unicode characters; you could run python interactively in the environment that can show them e.g., in ConEmu console or a web browser. Try ipython notebook.Ziska
C
31

Your data contains characters outside of the Basic Multilingual Plane. Emoji's for example, are outside the BMP, and the window system used by IDLE, Tk, cannot handle such characters.

You could use a translation table to map everything outside of the BMP to the replacement character:

import sys
non_bmp_map = dict.fromkeys(range(0x10000, sys.maxunicode + 1), 0xfffd)
print(x.translate(non_bmp_map))

The non_bmp_map maps all codepoints outside the BMP (any codepoint higher than 0xFFFF, all the way up to the highest Unicode codepoint your Python version can handle) to U+FFFD REPLACEMENT CHARACTER:

>>> print('This works outside IDLE! \U0001F44D')
This works outside IDLE! 👍
>>> print('This works in IDLE too! \U0001F44D'.translate(non_bmp_map))
This works in IDLE too! �
Caracole answered 7/9, 2015 at 16:15 Comment(2)
Thank you, but after add these, is shows new error: print (x1.translate(non_bmp_map)) AttributeError: 'dict' object has no attribute 'translate', so how to do it??Cryptonym
@Andi: x1 is not a string, then, but a dictionary. You could do str(x1).translate(non_bmp_map) in that case.Caracole
O
4

None of these worked for me but the following does. This assumes that public_tweets was pulled from tweepy api.search

for tweet in public_tweets:
    print (tweet.text)
    u=tweet.text
    u=u.encode('unicode-escape').decode('utf-8')
Organotherapy answered 2/12, 2018 at 7:41 Comment(0)
M
4

this unicode issue has been seen in python 3.6 and older versions, to resolve it just upgrade python as python 3.8 and use your code.This error will not come.

Mcguinness answered 19/10, 2019 at 12:36 Comment(1)
I'm glad to upvote your answer because 1) I find it very useful, 2) I am the very first to make you an upvote to welcome you on stackoverflow.Crosspollination

© 2022 - 2024 — McMap. All rights reserved.