Parsing IMAP Email BODYSTRUCTURE for Attachment Names

Asked 10/7, 2015 at 7:56 Answered 27/8, 2015 at 12:41

I wrote a Python script to access, manage and filter my emails via IMAP (using Python's imaplib).

To get the list of attachment for an email (without first downloading the entire email), I fetched the bodystructure of the email using the UID of the email, i.e.:

imap4.uid('FETCH', emailUID, '(BODYSTRUCTURE)')

and retrieve the attachment names from there.

Normally, the "portion" containing the attachment name would look like:

("attachment" ("filename" "This is the first attachment.zip"))

But on a couple of occasions, I encountered something like:

("attachment" ("filename" {34}', 'This is the second attachment.docx'))

I read somewhere that sometimes, instead of representing strings wrapped in double quotes, IMAP would use curly brackets with string length followed by the actual string (without quotes).

e.g.

{16}This is a string

But the string above doesn't seem to strictly adhere to that (there's a single-quote, a comma, and a space after the closing curly bracket, and the string itself is wrapped in single-quotes).

When I downloaded the entire email, the header for the message part containing that attachment seemed normal:

Content-Type: application/docx
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="This is the second attachment.docx"

How can I interpret (erm... parse) that "abnormal" body structure, making sense of the extra single-quotes, comma, etc...

And is that "standard"?

Naldo answered 10/7, 2015 at 7:56 Comment(0)

What you're looking at is a mangled literal, perhaps damaged by cut and waste? A literal looks like

{5}
Hello

That is, the length, then a CRLF, then that many bytes (not characters):

{4}
🐮

Pooch answered 10/7, 2015 at 9:23 Comment(2)

I'm with @arnt. OP: If you want a more detailed or alternative explanation (as I guess from your offering a bounty), I suggest you paste (a small snippet of) the raw output into your question. I guess there is probably a way with Python's imaplib to get a full debug transcript of the dialog between your client and the server. – Yelp 27/8, 2015 at 11:6

@Yelp unfortunately, I had since deleted the email that caused the problem, and have yet to encounter the problem since. i didn't think of getting a full debug transcript, but will look at it in case i encounter it again. the reason i offered the bounty is that i really don't think cut-and-paste caused the problem (i had directly output the variable to file to retrieve its value). nevertheless, given that with the limited information, arnt is the only possible answer, i'll award the bounty. – Naldo 28/8, 2015 at 8:12

Looks like IMAP-Tools, a GitHub project, includes a bodystructure parser.

Chamfron answered 27/8, 2015 at 12:41 Comment(1)

thanks, this library does look useful. but since the output I got from imaplib was already mangled, the input to the parse_bodystructure function will be that same mangled string. – Naldo 28/8, 2015 at 8:18

Recommended topics

Hot tags