How to decode Unicode escape sequence emojis encoded as \uXXXX?
Asked Answered
P

2

6

I am trying to use python to sort through my downloaded Instagram data, the data is a json file, but emoji and other non-text characters are encoded in a way I do not understand, for example:

The json file will contain: \u00e2\u009c\u008c\u00f0\u009f\u0096\u00a4\u00f0\u009f\u008d\u0095\u00f0\u009f\u008e\u00b6\u00f0\u009f\u00a4\u00af. Which on the instagram app is displayed: βœŒπŸ–€πŸ•πŸŽΆπŸ€―

Or json: \u00e2\u0080\u0099. Instagram: '(apostrophe)

I have tried to use u"string" and have found similar questions here, here and here but none are in python or provide any useful details to me.

Pessimist answered 2/3, 2021 at 15:34 Comment(0)
S
9

Try

.encode('latin-1').decode('utf-8')))
Silvery answered 2/3, 2021 at 16:51 Comment(4)
Hello, I am trying to write that text into a file, but I get the error: UnicodeEncodeError: 'charmap' codec can't encode characters in position 12-14: character maps to <undefined> . How can I fix this? – Momism
@HaykPetrosyan Does this answer your question? #27093333 – Silvery
Kinda, I just found later that the encoding of Instagram and Twitter texts are in latin-1 or something. So we must first decode from that then encode to utf-8 and do whatever. – Momism
@Silvery +1 for this. Can you share the source or how you found it? – Persimmon
R
-3

if you are on windows press win + . you will get a prompt with emojis then do print("πŸ‘") output: πŸ‘

Ripen answered 2/3, 2021 at 17:8 Comment(0)

© 2022 - 2024 β€” McMap. All rights reserved.