Using the email package, we can read in the .eml files. Then, use the BytesParser
library to parse the file. Finally, use a plain
preference (for plain text) with the get_body()
method, and get_content()
method to get the raw text of the email.
import email
from email import policy
from email.parser import BytesParser
import glob
file_list = glob.glob('*.eml') # returns list of files
with open(file_list[2], 'rb') as fp: # select a specific email file from the list
msg = BytesParser(policy=policy.default).parse(fp)
text = msg.get_body(preferencelist=('plain')).get_content()
print(text) # print the email content
>>> "Hi,
>>> This is an email
>>> Regards,
>>> Mister. E"
Granted, this is a simplified example - no mention of HTML or attachments. But it gets done essentially what the question asks and what I want to do.
Here is how you would iterate over several emails and save each as a plain text file:
file_list = glob.glob('*.eml') # returns list of files
for file in file_list:
with open(file, 'rb') as fp:
msg = BytesParser(policy=policy.default).parse(fp)
fnm = os.path.splitext(file)[0] + '.txt'
txt = msg.get_body(preferencelist=('plain')).get_content()
with open(fnm, 'w') as f:
print('Filename:', txt, file = f)