Reading .eml files with Python 3.6 using emaildata 0.3.4 - McMap

About

Reading .eml files with Python 3.6 using emaildata 0.3.4

Asked 14/8, 2017 at 16:52 Answered 22/8, 2017 at 16:36

Solved python parsing nlp eml

A

1

10

I am using python 3.6.1 and I want to read in email files (.eml) for processing. I am using the emaildata 0.3.4 package, however whenever I try to import the Text class as in the documentation, I get the module errors:

import email
from email.text import Text
>>> ModuleNotFoundError: No module named 'cStringIO'

When I tried to correct using this update, I get the next error relating to mimetools

>>> ModuleNotFoundError: No module named 'mimetools'

Is it possible to use emaildata 0.3.4 with python 3.6 to parse .eml files? Or are there any other packages I can use to parse .eml files? Thanks

Amphora answered 14/8, 2017 at 16:52 Comment(2)

emaildata module has not been updated for over 2 years. It is not compatible with python 3. Consider using the email package from the standard library. – Kara 14/8, 2017 at 17:19

Thanks @Dmitri, I'll include an answer here using the email package for completeness. – Amphora 22/8, 2017 at 16:23

A

18

Using the email package, we can read in the .eml files. Then, use the BytesParser library to parse the file. Finally, use a plain preference (for plain text) with the get_body() method, and get_content() method to get the raw text of the email.

import email
from email import policy
from email.parser import BytesParser
import glob
file_list = glob.glob('*.eml') # returns list of files
with open(file_list[2], 'rb') as fp:  # select a specific email file from the list
    msg = BytesParser(policy=policy.default).parse(fp)
text = msg.get_body(preferencelist=('plain')).get_content()
print(text)  # print the email content
>>> "Hi,
>>> This is an email
>>> Regards,
>>> Mister. E"

Granted, this is a simplified example - no mention of HTML or attachments. But it gets done essentially what the question asks and what I want to do.

Here is how you would iterate over several emails and save each as a plain text file:

file_list = glob.glob('*.eml') # returns list of files
for file in file_list:
    with open(file, 'rb') as fp:
        msg = BytesParser(policy=policy.default).parse(fp)
        fnm = os.path.splitext(file)[0] + '.txt'
        txt = msg.get_body(preferencelist=('plain')).get_content()
        with open(fnm, 'w') as f:
            print('Filename:', txt, file = f)

Amphora answered 22/8, 2017 at 16:36 Comment(3)

added an edit with a loop over file names, which I guess you had intended to add (feel free to roll back edit). – Hitandrun 1/11, 2018 at 16:0

is there a way to extract only the sender address ? – Reid 2/5, 2019 at 9:53

How to find meta-data information (from, sender,cc,subject,etc) and check if any attachment is present? – Dehisce 9/4, 2020 at 5:27

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.