Documentation on Apple Mail's .emlx data structure(s) (for conversion purposes)?
Asked Answered
A

5

8

This appears to be a rare gem: where to find documentation on the structure of Apple Mail's .emlx files (and their partial variants, and the meaning of the directory structures). The docs do not appear to exist on Apple's site, nor can I find any reasonable mention of it via Google.

The point of this is the creation of a bash/ruby/python/insert-script-langauge-here script to convert a mess of these files into something usable/pliable, like Maildir or Mbox. The ultimate goal is to migrate a snapshot of a user's /Library/Mail store into an existing Dovecot setup, which uses a form of Maildir.

Yes, I am aware of this program but it does not address the solution I am after. Converting 20 mailboxes by hand and manually inserting them into an existing installation will require more hours than just writing a script that digests the messages into something else and then automatically storing them where they should be. Nevermind that there are potentially a half-dozen more users that will require this procedure. So it's worth my time to script it up.

Please vote to close the duplicate of this question while it is pending deletion, instead of voting for this question to close. For some reason, there are occasional posting glitches when using Chrome as a browser.

FOLLOW-UP: It appears that the format really is undocumented, and that most sources have reverse-engineered it. If I have time I will attempt to do so my self; and if I'm successful, I will post a 2nd follow-up with the details of my findings.

Abebi answered 19/5, 2009 at 18:56 Comment(1)
gist.github.com/karlcow/5276813 seems to contain some Python code of unknown provenance which implements roughly what's in the answers below.Carefree
E
3

Here is an emlx2mbox converter in ruby: Mailbox Converter.

I don't think it was written from any documentation of the spec, but it has undergone multiple updates, so hopefully evolved to handle at least some of the quirks of the format. The source code is about 250 lines long, and it looks readable and well-commented.

Ensample answered 19/5, 2009 at 19:12 Comment(4)
While it is not documentation on the data structures, it is at least functional source code, +1 for a step in the right direction. :)Abebi
The code does not produce any usable output? Argh! My eyes! The goggles! They do nothing!!!Abebi
I suspect that there is no public documentation of emlx, since every emlx tool I've seen has indicated that they reverse engineered the format. If this rb doesn't work, I guess you get the honor of making a better version. :)Ensample
Looks like that's where I'm at. But thank you for at least posting a tool that takes a crack at this. From what I can tell, there's a numeric indicator (size?), the message, and then some XML tagging at the end. I suspect that trimming the XML, ditching the numeric, and saving the remainder might do the trick.Abebi
V
4

A few more information documenting emlx format.

The message is composed:

  • a byte count for the message on the first line
  • a MIME dump of the message
  • an XML plist

The XML plist contains certains code such as

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
        <key>date-sent</key>
        <real>1362211252</real>
        <key>flags</key>
        <integer>8590195713</integer>
        <key>original-mailbox</key>
        <string>imap://****@127.0.0.1:143/mail/2013/03</string>
        <key>remote-id</key>
        <string>252</string>
        <key>subject</key>
        <string>Re: Foobar</string>
</dict>

The flags have been described by jwz and represents a 30 bit integer:

0      read                      1 << 0
1      deleted                   1 << 1
2      answered                  1 << 2
3      encrypted                 1 << 3
4      flagged                   1 << 4
5      recent                    1 << 5
6      draft                     1 << 6
7      initial (no longer used)  1 << 7
8      forwarded                 1 << 8
9      redirected                1 << 9
10-15  attachment count          3F << 10 (6 bits)
16-22  priority level            7F << 16 (7 bits)
23     signed                    1 << 23
24     is junk                   1 << 24
25     is not junk               1 << 25
26-28  font size delta           7 << 26 (3 bits)
29     junk mail level recorded  1 << 29
30     highlight text in toc     1 << 30
31     (unused)

Sending myself a simple message and removing some details, so you can see the full data structure of emlx files.

875       
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on ******.*********.***
X-Spam-Level: 
X-Spam-Status: No, score=-3.2 required=4.2 tests=BAYES_00,RP_MATCHES_RCVD,
        SPF_PASS,TVD_SPACE_RATIO autolearn=ham version=3.3.2
Received: from [127.0.0.1] (******.*********.*** [***.**.**.**])
        by ******.*********.*** (8.14.5/8.14.5) with ESMTP id r2TN8m4U099571
        for <****@*********.***>; Fri, 29 Mar 2013 19:08:48 -0400 (EDT)
        (envelope-from ****@*********.***)
Subject: very simple
From: Karl Dubost <****@*********.***>
Content-Type: text/plain; charset=us-ascii
Message-Id: <4E83618E-BB56-404F-8595-87352648ADC7@*********.***>
Date: Fri, 29 Mar 2013 19:09:06 -0400
To: Karl Dubost <****@*********.***>
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v1283)
X-Mailer: Apple Mail (2.1283)

message Foo
-- 
Karl Dubost
http://www.la-grange.net/karl/
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
        <key>date-sent</key>
        <real>1364598546</real>
        <key>flags</key>
        <integer>8590195713</integer>
        <key>original-mailbox</key>
        <string>imap://********@127.0.0.1:11143/mail/2013/03</string>
        <key>remote-id</key>
        <string>41147</string>
        <key>subject</key>
        <string>very simple</string>
</dict>
</plist>
Verify answered 29/3, 2013 at 23:19 Comment(1)
Hello Karl! - Thanks for your work on this - But I've been unable to read the plist portion of an emlx file.Apfel
E
3

Here is an emlx2mbox converter in ruby: Mailbox Converter.

I don't think it was written from any documentation of the spec, but it has undergone multiple updates, so hopefully evolved to handle at least some of the quirks of the format. The source code is about 250 lines long, and it looks readable and well-commented.

Ensample answered 19/5, 2009 at 19:12 Comment(4)
While it is not documentation on the data structures, it is at least functional source code, +1 for a step in the right direction. :)Abebi
The code does not produce any usable output? Argh! My eyes! The goggles! They do nothing!!!Abebi
I suspect that there is no public documentation of emlx, since every emlx tool I've seen has indicated that they reverse engineered the format. If this rb doesn't work, I guess you get the honor of making a better version. :)Ensample
Looks like that's where I'm at. But thank you for at least posting a tool that takes a crack at this. From what I can tell, there's a numeric indicator (size?), the message, and then some XML tagging at the end. I suspect that trimming the XML, ditching the numeric, and saving the remainder might do the trick.Abebi
S
2

As of 2020, Python has a leightweight emlx library.

pip install emlx

and then

>>> import emlx
>>> m = emlx.read("12345.emlx")

>>> m.headers
{'Subject': 'Re: Emlx library ✉️',
 'From': 'Michael <[email protected]>',
 'Date': 'Thu, 30 Jan 2020 20:25:43 +0100',
 'Content-Type': 'text/plain; charset=utf-8',
 ...}
>>> m.headers['Subject']
'Re: Emlx library ✉️'

>>> m.plist
{'color': '000000',
 'conversation-id': 12345,
 'date-last-viewed': 1580423184,
 'flags': {...},
 ...}

>>> m.flags
{'read': True, 'answered': True, 'attachment_count': 2}
Semanteme answered 24/2, 2020 at 10:31 Comment(0)
S
1

I am using mailcore2 to parse .eml messages. To make this work with .emlx, I just had to remove the first line (containing a number). The message itself is equipped with the length of the message so the XML block at the end does not need to be removed.

Here is how I did it in objective-c/cocoa (MCOMessageParser comes from the mailcore2 framework):

-(Documents *)ParseEmlMessageforPath: (NSString*)fullpath filename:(NSString*)filename{
NSLog(@"fullpath = %@", fullpath);
NSError * error;
error = nil;
NSData *fileContents = [NSData dataWithContentsOfFile:fullpath options:NSDataReadingMappedIfSafe error:&error];
if (error) { 
     [[NSApplication sharedApplication] presentError:error];
}
MCOMessageParser * parser;
if (fileContents) {
    if ([[fullpath pathExtension] isEqualToString:@"emlx"]) {
        NSData * linefeed = [(NSString*)@"\n" dataUsingEncoding:NSUTF8StringEncoding ];
        NSInteger filelength = [fileContents length];
        NSRange  xx = NSMakeRange(0, 20); 
        NSRange pos = [fileContents rangeOfData:linefeed options:0 range:xx] ;
        if (pos.location != NSNotFound) {
            NSData *subcontent = [fileContents subdataWithRange:(NSRange){pos.location+1, filelength-(pos.location)-1}];
            parser = [MCOMessageParser messageParserWithData:subcontent];
        } else {
            return nil;
        }

    } else {
        parser = [MCOMessageParser messageParserWithData:fileContents];

    }

And there you go....

Synthesis answered 27/7, 2013 at 12:16 Comment(0)
M
0

The original emlx2mbox ruby script was written a long time ago. I have updated it to run with modern ruby environment. Please check it out on https://github.com/imdatsolak/elmx2mbox

Mercantilism answered 2/12, 2021 at 11:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.