Save email attachment (python3, pop3_ssl, gmail)
Asked Answered
S

1

3

I'm trying to save email attachment from Google mail account.

AFAIK, it can be done 'walking' the message and getting its payload,

for part in message.walk():
    # getting payload, saving attach etc.

but it does not work. See the whole example below:

def test_save_attach(self):
    self.connection = poplib.POP3_SSL('pop.gmail.com', 995)
    self.connection.set_debuglevel(1)
    self.connection.user(USERNAME)
    self.connection.pass_(PASS)

    emails, total_bytes = self.connection.stat()
    print("{0} emails in the inbox, {1} bytes total".format(emails, total_bytes))
    # return in format: (response, ['mesg_num octets', ...], octets)
    msg_list = self.connection.list()
    print(msg_list)

    # messages processing
    for i in range(emails):
        response = self.connection.retr(i+1)
        # return in format: (response, ['line', ...], octets)
        lines = response[1]
        str_message = email.message_from_bytes(b''.join(lines))
        print(str_message)

        # save attach
        for part in str_message.walk():
            print(part.get_content_type())

            if part.get_content_maintype() == 'multipart':
                continue

            if part.get('Content-Disposition') is None:
                continue

            filename = part.get_filename()
            if not(filename): continue

            fp = open(os.path.join(self.savedir, filename), 'wb')
            fp.write(part.get_payload(decode=1))
            fp.close

    self.connection.quit()

Script output is:

*cmd* 'USER **********'
*cmd* 'PASS **********'
*cmd* 'STAT'
*stat* [b'+OK', b'1', b'5301']
1 emails in the inbox, 5301 bytes total
*cmd* 'LIST'
(b'+OK 1 messages (5301 bytes)', [b'1 5301'], 8)
*cmd* 'RETR 1'
[<message headers and body>]
text/plain
*cmd* 'QUIT'

As we can see, the only part of the message has 'text/plain' format and does not contain any attach information, although the message body defenitely contains it and it can be seen while debug output.

Smithereens answered 29/11, 2011 at 8:30 Comment(0)
R
9
response = self.connection.retr(i+1)
raw_message = response[1]

raw_message is not a string. retr returns the message as a list of single lines. you are trying to convert the list into a string with str(raw_message) - that doesn't work.

instead, join these lines together, eg, replace

str_message = email.message_from_string(str(raw_message))

with:

python2:

str_message = email.message_from_string("\n".join(raw_message))

python3:

str_message = email.message_from_bytes(b'\n'.join(raw_message))

edit:// adding my full working source and output to help debug the problem

import poplib
import email
import os

class GmailTest(object):
    def __init__(self):
        self.savedir="/tmp"

    def test_save_attach(self):
        self.connection = poplib.POP3_SSL('pop.gmail.com', 995)
        self.connection.set_debuglevel(1)
        self.connection.user("<munged>")
        self.connection.pass_("<munged>")

        emails, total_bytes = self.connection.stat()
        print("{0} emails in the inbox, {1} bytes total".format(emails, total_bytes))
        # return in format: (response, ['mesg_num octets', ...], octets)
        msg_list = self.connection.list()
        print(msg_list)

        # messages processing
        for i in range(emails):

            # return in format: (response, ['line', ...], octets)
            response = self.connection.retr(i+1)
            raw_message = response[1]

            str_message = email.message_from_bytes(b'\n'.join(raw_message))

            # save attach
            for part in str_message.walk():
                print(part.get_content_type())

                if part.get_content_maintype() == 'multipart':
                    continue

                if part.get('Content-Disposition') is None:
                    print("no content dispo")
                    continue

                filename = part.get_filename()
                if not(filename): filename = "test.txt"
                print(filename)

                fp = open(os.path.join(self.savedir, filename), 'wb')
                fp.write(part.get_payload(decode=1))
                fp.close

        #I  exit here instead of pop3lib quit to make sure the message doesn't get removed in gmail
        import sys
        sys.exit(0)

d=GmailTest()
d.test_save_attach()

output:

python3 thetest.py
*cmd* 'USER <munged>'
*cmd* 'PASS <munged>'
*cmd* 'STAT'
*stat* [b'+OK', b'2', b'152928']
2 emails in the inbox, 152928 bytes total
*cmd* 'LIST'
(b'+OK 2 messages (152928 bytes)', [b'1 76469', b'2 76459'], 18)
*cmd* 'RETR 1'
multipart/mixed
text/plain
test.txt
application/pdf
ADDFILE_0.pdf
*cmd* 'RETR 2'
multipart/mixed
text/plain
test.txt
application/pdf
ADDFILE_0.pdf
Rodenhouse answered 29/11, 2011 at 9:28 Comment(11)
I see, but this line seems to have 'bytes' type, so string.join() does not work. If I do something like lines = response[1] message_text = "" for line in lines: message_text += str(line) I'm getting the same effectSmithereens
ah, you seem to use python3, I'm not familiar with that (yet),sorry! you probably need to concatenate the bytes and then use email.message_from_bytes (which is new in python 3 afaik) - but that's only a guessRodenhouse
Using email.message_from_bytes() gives the same result.Smithereens
ok, I dived into python3 ;-) this seems to work str_message = email.message_from_bytes(b'\n'.join(raw_message))Rodenhouse
I have done almost the same. The problem seems to be elsewhere.Smithereens
Saying 'the same result' I mean that I described at my first post - message creates sucessfully, but message.walk() makes only one iteration with 'text/plain' content type.Smithereens
I tested the code here and it works fine and creates two attachments (text part and attached test pdf file) . are you 100% sure your message source even has a Content-Disposition header in the attachment?Rodenhouse
I'm sure: 'Content-Disposition: attachment; filename="sqlite.py"'Smithereens
I added my full working source and output, maybe you find a difference.Rodenhouse
That's a mistake: str_message = email.message_from_bytes(b'\n'.join(lines))Smithereens
I forgot '\n' to separate headers. 5 hours of debug :) Thanks!Smithereens

© 2022 - 2024 — McMap. All rights reserved.