Python - eml file edit
Asked Answered
B

1

0

I can download the eml file using mime-content. I need to edit this eml file and delete attachments. I can look up attachment names. If I understand correctly, the first is the email header, the body, and then the attachments. I need advice on how to delete attachments from the body of an email.

import email
from email import policy
from email.parser import BytesParser
with open('messag.eml', 'rb') as fp:  # select a specific email file
    msg = BytesParser(policy=policy.default).parse(fp)
    text = msg.get_body(preferencelist=('plain')).get_content()
    print(text)  # print the email content
    for attachment in attachments:
        fnam=attachment.get_filename()
        print(fnam) #print attachment name
Briton answered 5/11, 2021 at 9:3 Comment(2)
#1626903 is basically the same question for Python 2, but as the email API has changed considerably since then, I'm posting a new answer here, and leaving a pointer at the old question.Waksman
Regarding understanding email messsage structures, probably refer to #48563435Waksman
W
3

The term "eml" is not strictly well-defined but it looks like you want to process standard RFC5322 (née 822) messages.

The Python email library went through an overhaul in Python 3.6; you'll want to make sure you use the modern API, like you already do (the one which uses a policy argument). The way to zap an attachment is simply to use its clear() method, though your code doesn't correctly fetch the attachments in the first place. Try this:

import email
from email import policy
from email.parser import BytesParser

with open('messag.eml', 'rb') as fp:  # select a specific email file
    msg = BytesParser(policy=policy.default).parse(fp)
    text = msg.get_body(preferencelist=('plain')).get_content()
    print(text)
    # Notice the iter_attachments() method
    for attachment in msg.iter_attachments():
        fnam = attachment.get_filename()
        print(fnam)
        # Remove this attachment
        attachment.clear()

with open('updated.eml', 'wb') as wp:
    wp.write(msg.as_bytes())

The updated message in updated.eml might have some headers rewritten, as Python doesn't preserve precisely the same spacing etc in all headers.

Waksman answered 5/11, 2021 at 10:19 Comment(2)
that's how it works. The only problem is that there are empty txt files instead of attachments, but I still care about the size of the email as such.Disendow
Not sure what you mean by that. If you have messages which do not have the prescribed structure, you probably want to put a condition in place to not modify them.Waksman

© 2022 - 2024 — McMap. All rights reserved.