How to handle multipart/alternative mail with JavaMail?
Asked Answered
R

3

13

I wrote an application which gets all emails from an inbox, filters the emails which contain a specific string and then puts those emails in an ArrayList.

After the emails are put in the List, I am doing some stuff with the subject and content of said emails. This works all fine for e-mails without an attachment. But when I started to use e-mails with attachments it all didn't work as expected anymore.

This is my code:

public void getInhoud(Message msg) throws IOException {
    try {
        cont = msg.getContent();
    } catch (MessagingException ex) {
        Logger.getLogger(ReadMailNew.class.getName()).log(Level.SEVERE, null, ex);
    }
    if (cont instanceof String) {
        String body = (String) cont;


    } else if (cont instanceof Multipart) {
        try {
            Multipart mp = (Multipart) msg.getContent();
            int mp_count = mp.getCount();
            for (int b = 0; b < 1; b++) {
                    dumpPart(mp.getBodyPart(b));
            }
        } catch (Exception ex) {
            System.out.println("Exception arise at get Content");
            ex.printStackTrace();
        }
    }
}

public void dumpPart(Part p) throws Exception {
    email = null;
    String contentType = p.getContentType();
    System.out.println("dumpPart" + contentType);
    InputStream is = p.getInputStream();
    if (!(is instanceof BufferedInputStream)) {
        is = new BufferedInputStream(is);
    }
    int c;
    final StringWriter sw = new StringWriter();
    while ((c = is.read()) != -1) {
        sw.write(c);
    }

    if (!sw.toString().contains("<div>")) {
        mpMessage = sw.toString();
        getReferentie(mpMessage);
    }
}

The content from the e-mail is stored in a String.

This code works all fine when I try to read mails without attachment. But if I use an e-mail with attachment the String also contains HTML code and even the attachment coding. Eventually I want to store the attachment and the content of an e-mail, but my first priority is to get just the text without any HTML or attachment coding.

Now I tried an different approach to handle the different parts:

public void getInhoud(Message msg) throws IOException {
    try {
        Object contt = msg.getContent();

        if (contt instanceof Multipart) {
            System.out.println("Met attachment");
            handleMultipart((Multipart) contt);
        } else {
            handlePart(msg);
            System.out.println("Zonder attachment");

        }
    } catch (MessagingException ex) {
        ex.printStackTrace();
    }
}

public static void handleMultipart(Multipart multipart)
        throws MessagingException, IOException {
    for (int i = 0, n = multipart.getCount(); i < n; i++) {
        handlePart(multipart.getBodyPart(i));
        System.out.println("Count "+n);
    }
}

 public static void handlePart(Part part)
        throws MessagingException, IOException {

    String disposition = part.getDisposition();
    String contentType = part.getContentType();
    if (disposition == null) { // When just body
        System.out.println("Null: " + contentType);
        // Check if plain
        if ((contentType.length() >= 10)
                && (contentType.toLowerCase().substring(
                0, 10).equals("text/plain"))) {
            part.writeTo(System.out);
        } else if ((contentType.length() >= 9)
                && (contentType.toLowerCase().substring(
                0, 9).equals("text/html"))) {
            part.writeTo(System.out);
        } else if ((contentType.length() >= 9)
                && (contentType.toLowerCase().substring(
                0, 9).equals("text/html"))) {
            System.out.println("Ook html gevonden");
            part.writeTo(System.out);
        }else{
            System.out.println("Other body: " + contentType);
            part.writeTo(System.out);
        }
    } else if (disposition.equalsIgnoreCase(Part.ATTACHMENT)) {
        System.out.println("Attachment: " + part.getFileName()
                + " : " + contentType);
    } else if (disposition.equalsIgnoreCase(Part.INLINE)) {
        System.out.println("Inline: "
                + part.getFileName()
                + " : " + contentType);
    } else {
        System.out.println("Other: " + disposition);
    }
}

This is what is returned from the System.out.printlns

Null: multipart/alternative; boundary=047d7b6220720b499504ce3786d7
Other body: multipart/alternative; boundary=047d7b6220720b499504ce3786d7
Content-Type: multipart/alternative; boundary="047d7b6220720b499504ce3786d7"

--047d7b6220720b499504ce3786d7
Content-Type: text/plain; charset="ISO-8859-1"

'Text of the message here in normal text'

--047d7b6220720b499504ce3786d7
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable

'HTML code of the message'

This approach returns the normal text of the e-mail but also the HTML coding of the mail. I really don't understand why this happens, I've googled it but it seems like there is no one else with this problem.

Any help is appreciated,

Thanks!

Roana answered 11/11, 2012 at 14:24 Comment(2)
How do you create your Message instances? Do you use JavaMail default implementation, which version? Do you manipulate your messages' bodies anyhow before feeding them to getInhoud routine? I've just tried your code sample and it worked for me (JM 1.4.5).Melmela
Helpful hint: If you want specific answers to your questions, try giving us an SSCCE (sscce.org).Aspasia
F
27

I found reading e-mail with the JavaMail library much more difficult than expected. I don't blame the JavaMail API, rather I blame my poor understanding of RFC-5322 -- the official definition of Internet e-mail.

As a thought experiment: Consider how complicated an e-mail message can become in the real world. It is possible to "infinitely" embed messages within messages. Each message itself may have multiple attachments (binary or human-readable text). Now imagine how complicated this structure becomes in the JavaMail API after parsing.

A few tips that may help when traversing e-mail with JavaMail:

  • Message and BodyPart both implement Part.
  • MimeMessage and MimeBodyPart both implement MimePart.
  • Where possible, treat everything as a Part or MimePart. This will allow generic traversal methods to be built more easily.

These Part methods will help to traverse:

  • String getContentType(): Starts with the MIME type. You may be tempted to treat this as a MIME type (with some hacking/cutting/matching), but don't. Better to only use this method inside the debugger for inspection.
    • Oddly, MIME type cannot be extracted directly. Instead use boolean isMimeType(String) to match. Read docs carefully to learn about powerful wildcards, such as "multipart/*".
  • Object getContent(): Might be instanceof:
    • Multipart -- container for more Parts
      • Cast to Multipart, then iterate as zero-based index with int getCount() and BodyPart getBodyPart(int)
        • Note: BodyPart implements Part
      • In my experience, Microsoft Exchange servers regularly provide two copies of the body text: plain text and HTML.
        • To match plain text, try: Part.isMimeType("text/plain")
        • To match HTML, try: Part.isMimeType("text/html")
    • Message (implements Part) -- embedded or attached e-mail
    • String (just the body text -- plain text or HTML)
      • See note above about Microsoft Exchange servers.
    • InputStream (probably a BASE64-encoded attachment)
  • String getDisposition(): Value may be null
    • if Part.ATTACHMENT.equalsIgnoreCase(getDisposition()), then call getInputStream() to get raw bytes of the attachment.

Finally, I found the official Javadocs exclude everything in the com.sun.mail package (and possibly more). If you need these, read the code directly, or generate the unfiltered Javadocs by downloading the source and running mvn javadoc:javadoc in the mail project module of the project.

Flacon answered 5/6, 2013 at 4:51 Comment(3)
Thanks for your input! At the moment this isn't relevant for me anymore, but I can understand it will possibly be helpful for others struggling with this problem.Roana
Multipart doesn't extend Part interface.Ensoll
@RRM: Good catch. Fixed and improved.Flacon
J
1

Following up on Kevin's helpful advice, analyzing your email content Java object types with respect to their canonical names (or simple names) can be helpful too. For example, looking at one inbox I've got right now, of 486 messages 399 are Strings, and 87 are MimeMultipart. This suggests that - for my typical email - a strategy that uses instanceof to first peel off Strings is best.

Of the Strings, 394 are text/plain, and 5 are text/html. This will not be the case for most; it's reflective of my email feeds into this particular inbox.

But wait - there's more!!! :-) The HTML sneaks in there nevertheless: of the 87 Multipart's, 70 are multipart/alternative. No guarantees, but most (if not all of these) are TEXT + HTML.

Of the other 17 multipart, incidentally, 15 are multipart/mixed, and 2 are multipart/signed.

My use case with this inbox (and one other) is primarily to aggregate and analyze known mailing list content. I can't ignore any of the messages, but an analysis of this sort helps me make my processing more efficient.

Judgeship answered 17/5, 2014 at 15:29 Comment(1)
Excellent Real World feedback. The variety of valid email objects is amazing... each server seems to have their own preferences.Flacon

© 2022 - 2024 — McMap. All rights reserved.