A bit of context...
Some time ago, I wrote Python a program that deals with email messages, one thing that always comes across is to know whether an email is "multipart" or not.
After a bit of research, I knew that it has something to do with emails containing HTML, or attachments etc... But I didn't really understand it.
My usage of it was limited to 2 instances:
1. When I had to save the attachment from the raw email
I just found this on the internet (probably on here - Sorry for not crediting the person who wrote it but I can't seem to find him again :/) and pasted it in my code
def downloadAttachments(emailMsg, pathToSaveFile):
"""
Save Attachments to pathToSaveFile (Example: pathToSaveFile = "C:\\Program Files\\")
"""
att_path_list = []
for part in emailMsg.walk():
# multipart are just containers, so we skip them
if part.get_content_maintype() == 'multipart':
continue
# is this part an attachment ?
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
att_path = os.path.join(pathToSaveFile, filename)
#Check if its already there
if not os.path.isfile(att_path) :
# finally write the stuff
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
att_path_list.append(att_path)
return att_path_list
2. When I had to get the text from the raw email
Also pasted from someone on the internet without really understanding how it works.
def get_text(emailMsg):
"""
Output: body of the email (text content)
"""
if emailMsg.is_multipart():
return get_text(emailMsg.get_payload(0))
else:
return emailMsg.get_payload(None, True)
What I do understand...
Is that if the email message is multipart, the parts can be iterated over.
My question is
What exactly are these parts? How do you know which one is html for example? Or which one is an attachment? Or just the body?
if part.get('Content-Disposition') is None:
is incorrect. This simply tells you that this part doesn't have an explicit disposition; so you have to infer an implicit disposition, which depends on the part's type.text/*
are commonly implicitly inline, while most other types are implicitly attachment. – Trassget_text()
is similarly naive. If you want to decide what to display as "the message", you want to avoid parts which are explicitlyContent-Disposition: attachment
or embedded in e.g. a bounce message. If there is amultipart/alternative
(which might in practice be labelledmultipart/mixed
ormultipart/related
just as well) there are probably multiple renderings of the message body, and you pick the one which is suitable for your use case. – Trass