How do I recieve a html email as a regular text?
Asked Answered
N

5

6

Here is the code I have thus far:

import email, imaplib

user = 'some username'
pwd = 'some password'

m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user, pwd)

m.select("[Gmail]/All Mail")

resp, data = m.fetch(1, "(RFC822)")

email_body = data[0][1]

mail = email.message_from_string(email_body)

print mail

I currently receive the email with a bunch of weird formatting. I would like to receive the email body as a plain text string.

Nagoya answered 10/8, 2010 at 13:6 Comment(1)
Just parse the HTML with a regex... oh wait, don't do that.Ify
C
11

(I've just tried this with my Gmail account.) The problem isn't the HTML mail, it's that your messages are MIME multipart and you're printing the full string of this. This is because email is fundamentally a plain-text format (as was mentioned above); when people wanted to send rich content in emails they came up with MIME, which is a method to do this without modifying the email standard. When you print mail, you are printing the full MIME message, encoded so that it can be sent as an email. You want to extract the payload.

But -- you've already done all the hard work! Just get the payload of the parsed email.message.Message instance:

mail.get_payload()[ 0 ].get_payload()

(Note: I had to do this twice for the first message in my Gmail inbox because it was encoded as a MIMEMultipart, but with only one leaf. YMMV.)

Crane answered 10/8, 2010 at 13:25 Comment(2)
I can only use message_from_bytes and not from string. How is that?Sofia
@JohnAndrews, it sounds like the response type of your message is bytes instead of a string.Talishatalisman
D
7

In Python 3.x you can do it in a very easy way by importing 'imaplib' and 'email' packages. Although this is an older post but maybe my answer can help new comers on this post.

status, data = self.imap.fetch(num, '(RFC822)')
        email_msg = email.message_from_bytes(data[0][1]) #email.message_from_string(data[0][1])

        #If message is multi part we only want the text version of the body, this walks the message and gets the body.

        if email_msg.is_multipart():
            for part in email_msg.walk():       
                if part.get_content_type() == "text/plain":
                    body = part.get_payload(decode=True) #to control automatic email-style MIME decoding (e.g., Base64, uuencode, quoted-printable)
                    body = body.decode()

                elif part.get_content_type() == "text/html":
                    continue

Now you can print body variable and it will be in plaintext format :) If it is good enough for you then it would be nice to select it as accepted answer.

Dominic answered 2/2, 2014 at 0:16 Comment(0)
V
6

Getting the right html/text isn't that easy and straightforward. As email can have html as attachment or even multiple HTML. Python 3 gives you a simple method to do that

mail = email.message_from_string(email_body, policy=policy.default)
mail.get_body().get_payload(decode=True)
Vue answered 28/7, 2017 at 14:38 Comment(0)
B
3

EMail is a text-only format, it doesn't know about formatting. So if you get a HTML mail, then that formatting has been "smuggled" into the pure-text body or you have a multi-part mail where the first part is pure text and the second part is the HTML formatted version.

So check if you have a multipart mail (see the docs) and filter accordingly. If it's not a multipart mail, use a HTML library like Beautiful Soup to get the text.

Briquet answered 10/8, 2010 at 13:13 Comment(0)
F
0

I would like to receive the email body as an ascii string.

It is highly unlikely that you won't be able to do this: What if it isn't ascii? Even if everything is in English it might contain characters outside the ascii range. imaplib's documentation is not going into any encoding questions, so I suppose the data arrives as byte strings, and I'd expect them to be utf-8 encoded. Other libraries might provide you with Unicode strings, which you can encode in any encoding you fancy (but utf-8 is a good choice).

However, I suppose you don't really mean "ascii" but what you think of as "plain text" -- a text, not html, version of the email. It might be a multipart mail, in which case you can extract the "text/plain" part from the string you receive. If it isn't, and the mail body was html only, you can use a third-party library. There are a number of options, discussed in this SO thread.

Fransen answered 10/8, 2010 at 13:22 Comment(1)
Thanks, you are correct about plain text. Edited original questionNagoya

© 2022 - 2024 — McMap. All rights reserved.