Encoding issue : decode Quoted-Printable string in Python
Asked Answered
S

3

11

In Python, I got a string encoded in Quoted-Printable encoding

mystring="=AC=E9"

This string should be printed as

é

So I want to decode it and encode it in UTF-8, I guess. I understand that something is possible through

import quopri
quopri.decodestring('=A3=E9')

But then, I'm completely lost. How would you do decode/encode this string to get printed properly?

Scleroderma answered 6/5, 2017 at 19:37 Comment(0)
P
16
import quopri

Encoding:

You can encode the character 'é' to Quoted-Printable using quopri.encodestring(). It takes a bytes object and returns the QP encoded bytes object.

encoded = quopri.encodestring('é'.encode('utf-8'))
print(encoded)

which prints b'=C3=A9' (but not "=AC=E9" or "=A3=E9" as specified in the question)

Decoding:

mystring = '=C3=A9'
decoded_string = quopri.decodestring(mystring)
print(decoded_string.decode('utf-8'))

quopri.decodestring() returns a bytes object which is encoded in utf-8(which may be what you want). If you want the character 'é' to be printed, decode utf-8 encoded bytes object using .decode() and pass 'utf-8' as argument.

Petrick answered 7/9, 2018 at 5:21 Comment(0)
S
7

Ok guys, I don't know exactly why but this function seems to work :

from email.parser import Parser

def decode_email(msg_str):
    p = Parser()
    message = p.parsestr(msg_str)
    decoded_message = ''
    for part in message.walk():
        charset = part.get_content_charset()
        if part.get_content_type() == 'text/plain':
            part_str = part.get_payload(decode=1)
            decoded_message += part_str.decode(charset)
    return decoded_message
Scleroderma answered 7/5, 2017 at 15:20 Comment(0)
A
1

Try this.

import quopri
mystring="=AC=E9"
decoded_string=quopri.decodestring(mystring)
print(decoded_string.decode('windows-1251'))
Aldric answered 6/5, 2017 at 20:17 Comment(1)
Unfortunately, I tried this before but it looks windows-1251 is designed to encode russian language. When I run your block of code , I get a printed ¬й . This is not what it supposed to looks like. A 'é'Scleroderma

© 2022 - 2024 — McMap. All rights reserved.