What are the "parts" in a multipart email?
Asked Answered
D

2

11

A bit of context...

Some time ago, I wrote Python a program that deals with email messages, one thing that always comes across is to know whether an email is "multipart" or not.

After a bit of research, I knew that it has something to do with emails containing HTML, or attachments etc... But I didn't really understand it.

My usage of it was limited to 2 instances:

1. When I had to save the attachment from the raw email

I just found this on the internet (probably on here - Sorry for not crediting the person who wrote it but I can't seem to find him again :/) and pasted it in my code

def downloadAttachments(emailMsg, pathToSaveFile):
    """
    Save Attachments to pathToSaveFile (Example: pathToSaveFile = "C:\\Program Files\\")
    """
    att_path_list = []
    for part in emailMsg.walk():
        # multipart are just containers, so we skip them
        if part.get_content_maintype() == 'multipart':
            continue

        # is this part an attachment ?
        if part.get('Content-Disposition') is None:
            continue

        filename = part.get_filename()

        att_path = os.path.join(pathToSaveFile, filename)

        #Check if its already there
        if not os.path.isfile(att_path) :
            # finally write the stuff
            fp = open(att_path, 'wb')
            fp.write(part.get_payload(decode=True))
            fp.close()
        att_path_list.append(att_path)
    return att_path_list

2. When I had to get the text from the raw email

Also pasted from someone on the internet without really understanding how it works.

def get_text(emailMsg):
    """
    Output: body of the email (text content)
    """
    if emailMsg.is_multipart():
        return get_text(emailMsg.get_payload(0))
    else:
        return emailMsg.get_payload(None, True)

What I do understand...

Is that if the email message is multipart, the parts can be iterated over.

My question is

What exactly are these parts? How do you know which one is html for example? Or which one is an attachment? Or just the body?

Distinguished answered 1/2, 2018 at 12:35 Comment(5)
Multipart is a way of encoding (possibly) multiple data elements inside a single body. It may mean that there are attachments or other items along with the text body but that is not necessary. You can also just encode a single message body in a multipart message.Insnare
if part.get('Content-Disposition') is None: is incorrect. This simply tells you that this part doesn't have an explicit disposition; so you have to infer an implicit disposition, which depends on the part's type. text/* are commonly implicitly inline, while most other types are implicitly attachment.Trass
The get_text() is similarly naive. If you want to decide what to display as "the message", you want to avoid parts which are explicitly Content-Disposition: attachment or embedded in e.g. a bounce message. If there is a multipart/alternative (which might in practice be labelled multipart/mixed or multipart/related just as well) there are probably multiple renderings of the message body, and you pick the one which is suitable for your use case.Trass
The question is based on an inaccurate mental model. The designers of this had a different model: A message (or a part) may have several parts, which may be alternatives ('multipart/alternative'), serial etc. Each of the 0-n parts must have a type, such as HTML, JPEG, text, Wordstar Document, audio/mp3, or many others. Each part may be specified as being inline or an attachment, and if it's not specified the receiver gets to guess. This model gives senders a lot of flexibility... which is IMO good because real-world senders aren't very disciplined anyway and will break whatever rules are made.Fillagree
@Fillagree "Gets to guess" is incorrect; the default disposition is defined for each type.Trass
T
15

An email message consists of a single MIME part, or a multipart structure with multiple MIME parts.

If there is no multipart structure, the message is compatible with pre-MIME RFC822 messages, and the Content-type: etc headers are optional (if you don't spell out a content type and encoding, Content-type: text/plain; charset="us-ascii" and Content-transfer-encoding: 7bit are implied, but they are still good to spell out for human readers; before MIME, inferring the type and encoding of content was more of a wild west best-guess situation).

There is no strict hierarchy or guidance for how exactly to use multipart messages. MIME simply defines a way to collect multiple payloads into a single email message. One of the original motivations I believe was to be able to embed pictures in text; but being able to attach binaries to a text message, and more generally, being able to create structured messages with payloads which are related in arbitrary ways is something which has simply been there for applications to use in whatever way they see fit.

A common misunderstanding is postulating a hierarchy into a "main part" and "subordinate" parts. It's certainly possible to create this structure, but it is by no means universally done. In fact, most multipart messages simply have a sequence of parts without any hierarchy. The user's email client will commonly pick one of the "inline" parts as the preferred "main" part to display in a message pane, but this is by no means dictated by the standard, or possible to enforce by the sending party.

Each MIME part has a set of headers which tell you the type, encoding, and disposition; for parts of type text/* the default disposition is "inline" (so it is often not explicitly spelled out) whereas most other parts have a default disposition of "attachment". You'll need to refer to the pertinent standards for a strict definition, but probably take it with a grain of salt, because many real-world applications are not particularly RFC-conformant.

For your concrete question, find the topmost leaf parts which are (implicitly or explicitly) inline, and display one which supports your use case as the "main" one. If you want to enforce HTML as the preferred format, you can do that; but many email applications defer this to the user to decide, and some users will definitely -- because of technical necessity, physical disabilities, or personal taste -- prefer plain-text when it's available.

Unfortunately, common practice by message producers recently has been to create a multipart/alternative container with text/plain and text/html members, but then provide a completely useless text/plain part and have all the actual content in a text/html part. The correct arrangement in this situation would be to simply not supply a text/plain part if you can't put anything useful in it (but I guess they only care about getting past some misguided spam filter, not about actually accommodating the preferences of the recipients).

Trass answered 1/2, 2018 at 12:54 Comment(2)
Python 3.6+ has a revamped email library with a method get_body which attempts to guess the "main body part" for you.Trass
Perhaps see also now https://mcmap.net/q/37893/-backend-protocol-for-loading-embedded-images-in-an-email for a similar exposition around how images in particular are linked in MIME email.Trass
C
2

The answers that you're looking for are all in the MIME standard, especially:

These standards together transformed e-mails from plaintext, English-only state to its current status where we have interesting ways of sending Unicode poo, properietary bitmaps with cute kittens, and also dozens of ways for non-conformant software and middleboxes along the path to corrupt the message in subtle and non-subtle ways. More details for these features are in:

For the IMAP-specific part of your question, i.e., how to best access the MIME tree of these parts via IMAP, see RFC3501, especially chapters which speak about BODY and BODYSTRUCTURE constructs.

If you would like to marvel at the beauty of MIME in action, take a look at the "MIME torture test". It is a bit tricky to find, because this random item on github is definitely not what I meant. Here's the original from Mark Crispin, an engineer who created IMAP:

Yes, that's a lot of reading. Unfortunately, you will really need to understand all of the above to handle MIME properly and safely. Please, do not skip these resources and standards unless you want to create abominations such as a random bulk-mailer which consistently splits non-ASCII codepoints in UTF-8 into several adjacent MIME encoded chunks, etc. Thank you.

Centrosymmetric answered 1/2, 2018 at 12:54 Comment(1)
The link is broken but searching for torture-test.mbox uncovered mirrorservice.org/sites/ftp.cac.washington.edu/imap/… as an apparent alternate location.Trass

© 2022 - 2024 — McMap. All rights reserved.